<!--<?xml version="1.0" encoding="utf-8"?>-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta http-equiv="Content-Style-Type" content="text/css"/>
<title>flat assembler</title>
<link rel="stylesheet" href="docs.php_files/fasm.css" type="text/css"/>
<style type="text/css">
  html, body { background-color: #F0D4B0; }
</style>
</head>

<body>

  <p class="mediumtext">
    <span class="maintitle">flat assembler</span><br/>Documentation and tutorials.
  </p>

  <p class="navigation">
    <a class="boldlink" href="http://flatassembler.net/index.php">Main index</a>
    <a class="boldlink" href="http://flatassembler.net/download.php">Download</a>
    <a class="boldlink" href="http://flatassembler.net/docs.php">Documentation</a>
    <a class="boldlink" href="http://flatassembler.net/examples.php">Examples</a>
    <a class="boldlink" href="http://board.flatassembler.net/index.php">Message board</a>
  </p>

  <div class="container">
     

<p><b>
<span class="largetext">flat assembler 1.68</span><br/>
<span class="mediumtext">Programmer's Manual</span><br/>
</b></p>

<p><b>
<span class="largetext">Table of Contents</span><br/>
<br/><span class="mediumtext">Chapter 1 - Introduction</span><br/>
<br/><a href="#1.1" class="mediumtext" style="padding-left: 20pt;">1.1  Compiler overview</a><br/>
<a href="#1.1.1" class="smalltext" style="padding-left: 40pt;">1.1.1  System requirements</a><br/>
<a href="#1.1.2" class="smalltext" style="padding-left: 40pt;">1.1.2  Executing compiler from command line</a><br/>
<a href="#1.1.3" class="smalltext" style="padding-left: 40pt;">1.1.3  Compiler messages</a><br/>
<a href="#1.1.4" class="smalltext" style="padding-left: 40pt;">1.1.4  Output formats</a><br/>
<br/><a href="#1.2" class="mediumtext" style="padding-left: 20pt;">1.2  Assembly syntax</a><br/>
<a href="#1.2.1" class="smalltext" style="padding-left: 40pt;">1.2.1  Instruction syntax</a><br/>
<a href="#1.2.2" class="smalltext" style="padding-left: 40pt;">1.2.2  Data definitions</a><br/>
<a href="#1.2.3" class="smalltext" style="padding-left: 40pt;">1.2.3  Constants and labels</a><br/>
<a href="#1.2.4" class="smalltext" style="padding-left: 40pt;">1.2.4  Numerical expressions</a><br/>
<a href="#1.2.5" class="smalltext" style="padding-left: 40pt;">1.2.5  Jumps and calls</a><br/>
<a href="#1.2.6" class="smalltext" style="padding-left: 40pt;">1.2.6  Size settings</a><br/>
<br/><span class="mediumtext">Chapter 2 - Instruction Set</span><br/>
<br/><a href="#2.1" class="mediumtext" style="padding-left: 20pt;">2.1  The x86 architecture instructions</a><br/>
<a href="#2.1.1" class="smalltext" style="padding-left: 40pt;">2.1.1  Data movement instructions</a><br/>
<a href="#2.1.2" class="smalltext" style="padding-left: 40pt;">2.1.2  Type conversion instructions</a><br/>
<a href="#2.1.3" class="smalltext" style="padding-left: 40pt;">2.1.3  Binary arithmetic instructions</a><br/>
<a href="#2.1.4" class="smalltext" style="padding-left: 40pt;">2.1.4  Decimal arithmetic instructions</a><br/>
<a href="#2.1.5" class="smalltext" style="padding-left: 40pt;">2.1.5  Logical instructions</a><br/>
<a href="#2.1.6" class="smalltext" style="padding-left: 40pt;">2.1.6  Control transfer instructions</a><br/>
<a href="#2.1.7" class="smalltext" style="padding-left: 40pt;">2.1.7  I/O instructions</a><br/>
<a href="#2.1.8" class="smalltext" style="padding-left: 40pt;">2.1.8  Strings operations</a><br/>
<a href="#2.1.9" class="smalltext" style="padding-left: 40pt;">2.1.9  Flag control instructions</a><br/>
<a href="#2.1.10" class="smalltext" style="padding-left: 40pt;">2.1.10  Conditional operations</a><br/>
<a href="#2.1.11" class="smalltext" style="padding-left: 40pt;">2.1.11  Miscellaneous instructions</a><br/>
<a href="#2.1.12" class="smalltext" style="padding-left: 40pt;">2.1.12  System instructions</a><br/>
<a href="#2.1.13" class="smalltext" style="padding-left: 40pt;">2.1.13  FPU instructions</a><br/>
<a href="#2.1.14" class="smalltext" style="padding-left: 40pt;">2.1.14  MMX instructions</a><br/>
<a href="#2.1.15" class="smalltext" style="padding-left: 40pt;">2.1.15  SSE instructions</a><br/>
<a href="#2.1.16" class="smalltext" style="padding-left: 40pt;">2.1.16  SSE2 instructions</a><br/>
<a href="#2.1.17" class="smalltext" style="padding-left: 40pt;">2.1.17  SSE3 instructions</a><br/>
<a href="#2.1.18" class="smalltext" style="padding-left: 40pt;">2.1.18  AMD 3DNow! instructions</a><br/>
<a href="#2.1.19" class="smalltext" style="padding-left: 40pt;">2.1.19  The x86-64 long mode instructions</a><br/>
<a href="#2.1.20" class="smalltext" style="padding-left: 40pt;">2.1.20  SSE4 instructions</a><br/>
<a href="#2.1.21" class="smalltext" style="padding-left: 40pt;">2.1.21  Other extensions of instruction set</a><br/>
<br/><a href="#2.2" class="mediumtext" style="padding-left: 20pt;">2.2  Control directives</a><br/>
<a href="#2.2.1" class="smalltext" style="padding-left: 40pt;">2.2.1  Numerical constants</a><br/>
<a href="#2.2.2" class="smalltext" style="padding-left: 40pt;">2.2.2  Conditional assembly</a><br/>
<a href="#2.2.3" class="smalltext" style="padding-left: 40pt;">2.2.3  Repeating blocks of instructions</a><br/>
<a href="#2.2.4" class="smalltext" style="padding-left: 40pt;">2.2.4  Addressing spaces</a><br/>
<a href="#2.2.5" class="smalltext" style="padding-left: 40pt;">2.2.5  Other directives</a><br/>
<a href="#2.2.6" class="smalltext" style="padding-left: 40pt;">2.2.6  Multiple passes</a><br/>
<br/><a href="#2.3" class="mediumtext" style="padding-left: 20pt;">2.3  Preprocessor directives</a><br/>
<a href="#2.3.1" class="smalltext" style="padding-left: 40pt;">2.3.1  Including source files</a><br/>
<a href="#2.3.2" class="smalltext" style="padding-left: 40pt;">2.3.2  Symbolic constants</a><br/>
<a href="#2.3.3" class="smalltext" style="padding-left: 40pt;">2.3.3  Macroinstructions</a><br/>
<a href="#2.3.4" class="smalltext" style="padding-left: 40pt;">2.3.4  Structures</a><br/>
<a href="#2.3.5" class="smalltext" style="padding-left: 40pt;">2.3.5  Repeating macroinstructions</a><br/>
<a href="#2.3.6" class="smalltext" style="padding-left: 40pt;">2.3.6  Conditional preprocessing</a><br/>
<a href="#2.3.7" class="smalltext" style="padding-left: 40pt;">2.3.7  Order of processing</a><br/>
<br/><a href="#2.4" class="mediumtext" style="padding-left: 20pt;">2.4  Formatter directives</a><br/>
<a href="#2.4.1" class="smalltext" style="padding-left: 40pt;">2.4.1  MZ executable</a><br/>
<a href="#2.4.2" class="smalltext" style="padding-left: 40pt;">2.4.2  Portable Executable</a><br/>
<a href="#2.4.3" class="smalltext" style="padding-left: 40pt;">2.4.3  Common Object File Format</a><br/>
<a href="#2.4.4" class="smalltext" style="padding-left: 40pt;">2.4.4  Executable and Linkable Format</a><br/>
</b></p>

<p><b>
<span class="mediumtext">Chapter 1</span><br/>
<span class="largetext">Introduction</span><br/>
</b></p>
<p class="smalltext">
This chapter contains all the most important information you need to begin
using the flat assembler. If you are experienced assembly language programmer,
you should read at least this chapter before using this compiler.
</p>

<p><b>
<a name="1.1" class="mediumtext">1.1  Compiler overview</a>
</b></p>
<p class="smalltext">
Flat assembler is a fast assembly language compiler for the x86 architecture
processors, which does multiple passes to optimize the size of generated
machine code. It is self-compilable and versions for different operating
systems are provided. All the versions are designed to be used from the system
command line and they should not differ in behavior.
</p>

<p><b>
<a name="1.1.1" class="smalltext">1.1.1  System requirements</a>
</b></p>
<p class="smalltext">
All versions require the x86 architecture 32-bit processor (at least 80386),
although they can produce programs for the x86 architecture 16-bit processors,
too. DOS version requires an OS compatible with MS DOS 2.0 and either true
real mode environment or DPMI. Windows version requires a Win32 console
compatible with 3.1 version.
</p>

<p><b>
<a name="1.1.2" class="smalltext">1.1.2  Executing compiler from command line</a>
</b></p>
<p class="smalltext">
To execute flat assembler from the command line you need to provide two
parameters - first should be name of source file, second should be name of
destination file. If no second parameter is given, the name for output
file will be guessed automatically. After displaying short information about the program name
and version, compiler will read the data from source file and compile it.
When the compilation is successful, compiler will write the generated code
to the destination file and display the summary of compilation process;
otherwise it will display the information about error that occurred.
</p>
<p class="smalltext">
In the command line you can also include <span class="smallcode">-m</span> option followed by a number,
which specifies how many kilobytes of memory flat assembler should maximally
use. In case of DOS version this options limits only the usage of extended
memory. The <span class="smallcode">-p</span> option followed by a number can be used to specify the limit
for number of passes the assembler performs. If code cannot be generated
within specified amount of passes, the assembly will be terminated with an
error message. The maximum value of this setting is 65536, while the default
limit, used when no such option is included in command line, is 100.

</p>
<p class="smalltext">
The source file should be a text file, and can be created in any text
editor. Line breaks are accepted in both DOS and Unix standards, tabulators
are treated as spaces.
</p>
<p class="smalltext">
There are no command line options that would affect the output of compiler,
flat assembler requires only the source code to include the information it
really needs. For example, to specify output format you specify it by using the
<span class="smallcode">format</span> directive at the beginning of source.
</p>

<p><b>
<a name="1.1.3" class="smalltext">1.1.3  Compiler messages</a>
</b></p>
<p class="smalltext">
As it is stated above, after the successful compilation, the compiler displays
the compilation summary. It includes the information of how many passes was
done, how much time it took, and how many bytes were written into the
destination file.
The following is an example of the compilation summary:
</p>
<pre class="smallcode">flat assembler  version 1.68 (16384 kilobytes memory)
38 passes, 5.3 seconds, 77824 bytes.
</pre>
<p class="smalltext">
In case of error during the compilation process, the program will display an
error message. For example, when compiler can't find the input file, it will
display the following message:
</p>
<pre class="smallcode">flat assembler  version 1.68 (16384 kilobytes memory)
error: source file not found.
</pre>
<p class="smalltext">
If the error is connected with a specific part of source code, the source line
that caused the error will be also displayed. Also placement of this line in
the source is given to help you finding this error, for example:
</p>
<pre class="smallcode">flat assembler  version 1.68 (16384 kilobytes memory)
example.asm [3]:
        mob     ax,1
error: illegal instruction.
</pre>
<p class="smalltext">
It means that in the third line of the <span class="smallcode">example.asm</span> file compiler has
encountered an unrecognized instruction. When the line that caused error
contains a macroinstruction, also the line in macroinstruction definition
that generated the erroneous instruction is displayed:
</p>
<pre class="smallcode">flat assembler  version 1.68 (16384 kilobytes memory)
example.asm [6]:
        stoschar 7
example.asm [3] stoschar [1]:
        mob     al,char
error: illegal instruction.
</pre>
<p class="smalltext">
It means that the macroinstruction in the sixth line of the <span class="smallcode">example.asm</span> file
generated an unrecognized instruction with the first line of its definition.
</p>

<p><b>
<a name="1.1.4" class="smalltext">1.1.4  Output formats</a>
</b></p>
<p class="smalltext">
By default, when there is no <span class="smallcode">format</span> directive in source file, flat
assembler simply puts generated instruction codes into output, creating this
way flat binary file. By default it generates 16-bit code, but you can always
turn it into the 16-bit or 32-bit mode by using <span class="smallcode">use16</span> or <span class="smallcode">use32</span> directive.
Some of the output formats switch into 32-bit mode, when selected - more
information about formats which you can choose can be found in <a href="#2.4">2.4</a>.
</p>
<p class="smalltext">
All output code is always in the order in which it was entered into the
source file.
</p>

<p><b>
<a name="1.2" class="mediumtext">1.2  Assembly syntax</a>
</b></p>
<p class="smalltext">
The information provided below is intended mainly for the assembler
programmers that have been using some other assembly compilers before.
If you are beginner, you should look for the assembly programming tutorials.
</p>
<p class="smalltext">
Flat assembler by default uses the Intel syntax for the assembly
instructions, although you can customize it using the preprocessor
capabilities (macroinstructions and symbolic constants). It also has its own
set of the directives - the instructions for compiler.
</p>
<p class="smalltext">
All symbols defined inside the sources are case-sensitive.
</p>

<p><b>
<a name="1.2.1" class="smalltext">1.2.1  Instruction syntax</a>
</b></p>
<p class="smalltext">
Instructions in assembly language are separated by line breaks, and one
instruction is expected to fill the one line of text. If a line contains
a semicolon, except for the semicolons inside the quoted strings, the rest of
this line is the comment and compiler ignores it. If a line ends with <span class="smallcode">\</span>
character (eventually the semicolon and comment may follow it), the next line
is attached at this point.
</p>
<p class="smalltext">
Each line in source is the sequence of items, which may be one of the three
types. One type are the symbol characters, which are the special characters
that are individual items even when are not spaced from the other ones.
Any of the <span class="smallcode">+-/*=&lt;&gt;()[]{}:,|&amp;~#`</span> is the symbol character. The sequence of
other characters, separated from other items with either blank spaces or
symbol characters, is a symbol. If the first character of symbol is either a
single or double quote, it integrates the any sequence of characters following
it, even the special ones, into a quoted string, which should end with the same
character, with which it began (the single or double quote) - however if there
are two such characters in a row (without any other character between them),
they are integrated into quoted string as just one of them and the quoted
string continues then. The symbols other than symbol characters and quoted
strings can be used as names, so are also called the name symbols.
</p>
<p class="smalltext">
Every instruction consists of the mnemonic and the various number of
operands, separated with commas. The operand can be register, immediate value
or a data addressed in memory, it can also be preceded by size operator to
define or override its size (table <a href="#_1.1">1.1</a>). Names of available registers you can
find in table <a href="#_1.2">1.2</a>, their sizes cannot be overridden. Immediate value can be
specified by any numerical expression.
</p>
<p class="smalltext">
When operand is a data in memory, the address of that data (also any
numerical expression, but it may contain registers) should be enclosed in
square brackets or preceded by <span class="smallcode">ptr</span> operator. For example instruction
<span class="smallcode">mov eax,3</span> will put the immediate value 3 into the EAX register, instruction
<span class="smallcode">mov eax,[7]</span> will put the 32-bit value from the address 7 into EAX and the
instruction <span class="smallcode">mov byte [7],3</span> will put the immediate value 3 into the byte at
address 7, it can also be written as <span class="smallcode">mov byte ptr 7,3</span>.
To specify which segment register should be used for addressing, segment register name followed
by a colon should be put just before the address value (inside the square
brackets or after the <span class="smallcode">ptr</span> operator).
</p>
<p class="smalltext">
<b><a name="_1.1">Table 1.1  Size operators</a></b>
</p>
<table class="doctable" style="width: 150px;">
  <tr>
    <th>Operator</th>
    <th>Bits</th>
    <th>Bytes</th>
  </tr>
  <tr>
    <td><span class="smallcode">byte</span></td>
    <td>8</td>
    <td>1</td>
  </tr>
  <tr>
    <td><span class="smallcode">word</span></td>
    <td>16</td>
    <td>2</td>
  </tr>
  <tr>
    <td><span class="smallcode">dword</span></td>
    <td>32</td>
    <td>4</td>
  </tr>
  <tr>
    <td><span class="smallcode">fword</span></td>
    <td>48</td>
    <td>6</td>
  </tr>
  <tr>
    <td><span class="smallcode">pword</span></td>
    <td>48</td>
    <td>6</td>
  </tr>
  <tr>
    <td><span class="smallcode">qword</span></td>
    <td>64</td>
    <td>8</td>
  </tr>
  <tr>
    <td><span class="smallcode">tbyte</span></td>
    <td>80</td>
    <td>10</td>
  </tr>
  <tr>
    <td><span class="smallcode">tword</span></td>
    <td>80</td>
    <td>10</td>
  </tr>
  <tr>
    <td><span class="smallcode">dqword</span></td>
    <td>128</td>
    <td>16</td>
  </tr>
</table>
<p class="smalltext">
<b><a name="_1.2">Table 1.2  Registers</a></b>
</p>
<table class="doctable" style="width: 430px;">
  <tr>
    <th style="width: 70px;">Type</th>
    <th style="width: 40px;">Bits</th>
    <th/>
  </tr>
  <tr>
    <td rowspan="3">General</td>
    <td>8</td>
    <td>
      <table class="intable">
        <tr>
          <td style="width: 12.5%;"><span class="smallcode">al</span></td>
          <td style="width: 12.5%;"><span class="smallcode">cl</span></td>
          <td style="width: 12.5%;"><span class="smallcode">dl</span></td>
          <td style="width: 12.5%;"><span class="smallcode">bl</span></td>
          <td style="width: 12.5%;"><span class="smallcode">ah</span></td>
          <td style="width: 12.5%;"><span class="smallcode">ch</span></td>
          <td style="width: 12.5%;"><span class="smallcode">dh</span></td>
          <td style="width: 12.5%;"><span class="smallcode">bh</span></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>16</td>
    <td>
      <table class="intable">
        <tr>
          <td style="width: 12.5%;"><span class="smallcode">ax</span></td>
          <td style="width: 12.5%;"><span class="smallcode">cx</span></td>
          <td style="width: 12.5%;"><span class="smallcode">dx</span></td>
          <td style="width: 12.5%;"><span class="smallcode">bx</span></td>
          <td style="width: 12.5%;"><span class="smallcode">sp</span></td>
          <td style="width: 12.5%;"><span class="smallcode">bp</span></td>
          <td style="width: 12.5%;"><span class="smallcode">si</span></td>
          <td style="width: 12.5%;"><span class="smallcode">di</span></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>32</td>
    <td>
      <table class="intable">
        <tr>
          <td style="width: 12.5%;"><span class="smallcode">eax</span></td>
          <td style="width: 12.5%;"><span class="smallcode">ecx</span></td>
          <td style="width: 12.5%;"><span class="smallcode">edx</span></td>
          <td style="width: 12.5%;"><span class="smallcode">ebx</span></td>
          <td style="width: 12.5%;"><span class="smallcode">esp</span></td>
          <td style="width: 12.5%;"><span class="smallcode">ebp</span></td>
          <td style="width: 12.5%;"><span class="smallcode">esi</span></td>
          <td style="width: 12.5%;"><span class="smallcode">edi</span></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>Segment</td>
    <td>16</td>
    <td>
      <table class="intable">
        <tr>
          <td style="width: 12.5%;"><span class="smallcode">es</span></td>
          <td style="width: 12.5%;"><span class="smallcode">cs</span></td>
          <td style="width: 12.5%;"><span class="smallcode">ss</span></td>
          <td style="width: 12.5%;"><span class="smallcode">ds</span></td>
          <td style="width: 12.5%;"><span class="smallcode">fs</span></td>
          <td style="width: 12.5%;"><span class="smallcode">gs</span></td>
          <td style="width: 12.5%;"><span class="smallcode"> </span></td>
          <td style="width: 12.5%;"><span class="smallcode"> </span></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>Control</td>
    <td>32</td>
    <td>
      <table class="intable">
        <tr>
          <td style="width: 12.5%;"><span class="smallcode">cr0</span></td>
          <td style="width: 12.5%;"><span class="smallcode"> </span></td>
          <td style="width: 12.5%;"><span class="smallcode">cr2</span></td>
          <td style="width: 12.5%;"><span class="smallcode">cr3</span></td>
          <td style="width: 12.5%;"><span class="smallcode">cr4</span></td>
          <td style="width: 12.5%;"><span class="smallcode"> </span></td>
          <td style="width: 12.5%;"><span class="smallcode"> </span></td>
          <td style="width: 12.5%;"><span class="smallcode"> </span></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>Debug</td>
    <td>32</td>
    <td>
      <table class="intable">
        <tr>
          <td style="width: 12.5%;"><span class="smallcode">dr0</span></td>
          <td style="width: 12.5%;"><span class="smallcode">dr1</span></td>
          <td style="width: 12.5%;"><span class="smallcode">dr2</span></td>
          <td style="width: 12.5%;"><span class="smallcode">dr3</span></td>
          <td style="width: 12.5%;"><span class="smallcode"> </span></td>
          <td style="width: 12.5%;"><span class="smallcode"> </span></td>
          <td style="width: 12.5%;"><span class="smallcode">dr6</span></td>
          <td style="width: 12.5%;"><span class="smallcode">dr7</span></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>FPU</td>
    <td>80</td>
    <td>
      <table class="intable">
        <tr>
          <td style="width: 12.5%;"><span class="smallcode">st0</span></td>
          <td style="width: 12.5%;"><span class="smallcode">st1</span></td>
          <td style="width: 12.5%;"><span class="smallcode">st2</span></td>
          <td style="width: 12.5%;"><span class="smallcode">st3</span></td>
          <td style="width: 12.5%;"><span class="smallcode">st4</span></td>
          <td style="width: 12.5%;"><span class="smallcode">st5</span></td>
          <td style="width: 12.5%;"><span class="smallcode">st6</span></td>
          <td style="width: 12.5%;"><span class="smallcode">st7</span></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>MMX</td>
    <td>64</td>
    <td>
      <table class="intable">
        <tr>
          <td style="width: 12.5%;"><span class="smallcode">mm0</span></td>
          <td style="width: 12.5%;"><span class="smallcode">mm1</span></td>
          <td style="width: 12.5%;"><span class="smallcode">mm2</span></td>
          <td style="width: 12.5%;"><span class="smallcode">mm3</span></td>
          <td style="width: 12.5%;"><span class="smallcode">mm4</span></td>
          <td style="width: 12.5%;"><span class="smallcode">mm5</span></td>
          <td style="width: 12.5%;"><span class="smallcode">mm6</span></td>
          <td style="width: 12.5%;"><span class="smallcode">mm7</span></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>SSE</td>
    <td>128</td>
    <td>
      <table class="intable">
        <tr>
          <td style="width: 12.5%;"><span class="smallcode">xmm0</span></td>
          <td style="width: 12.5%;"><span class="smallcode">xmm1</span></td>
          <td style="width: 12.5%;"><span class="smallcode">xmm2</span></td>
          <td style="width: 12.5%;"><span class="smallcode">xmm3</span></td>
          <td style="width: 12.5%;"><span class="smallcode">xmm4</span></td>
          <td style="width: 12.5%;"><span class="smallcode">xmm5</span></td>
          <td style="width: 12.5%;"><span class="smallcode">xmm6</span></td>
          <td style="width: 12.5%;"><span class="smallcode">xmm7</span></td>
        </tr>
      </table>
    </td>
  </tr>
</table>
<p><b>
<a name="1.2.2" class="smalltext">1.2.2  Data definitions</a>
</b></p>
<p class="smalltext">
To define data or reserve a space for it, use one of the directives listed in
table <a href="#_1.3">1.3</a>. The data definition directive should be followed by one or more of
numerical expressions, separated with commas. These expressions define the
values for data cells of size depending on which directive is used. For
example <span class="smallcode">db 1,2,3</span> will define the three bytes of values 1, 2 and 3
respectively.
</p>
<p class="smalltext">
The <span class="smallcode">db</span> and <span class="smallcode">du</span> directives also accept the quoted string values of any
length, which will be converted into chain of bytes when <span class="smallcode">db</span> is used and into
chain of words with zeroed high byte when <span class="smallcode">du</span> is used.
For example <span class="smallcode">db 'abc'</span> will define the three bytes of values 61, 62 and 63.
</p>
<p class="smalltext">
The <span class="smallcode">dp</span> directive and its synonym <span class="smallcode">df</span> accept the values consisting of two
numerical expressions separated with colon, the first value will become the
high word and the second value will become the low double word of the far
pointer value. Also <span class="smallcode">dd</span> accepts such pointers consisting of two word values
separated with colon, and <span class="smallcode">dt</span> accepts the word and quad word value separated
with colon, the quad word is stored first. The <span class="smallcode">dt</span> directive with single
expression as parameter accepts only floating point values and creates data in
FPU double extended precision format.
</p>
<p class="smalltext">
Any of the above directive allows the usage of special <span class="smallcode">dup</span> operator to
make multiple copies of given values. The count of duplicates should precede
this operator and the value to duplicate should follow - it can even be the
chain of values separated with commas, but such set of values needs to be
enclosed with parenthesis, like <span class="smallcode">db 5 dup (1,2)</span>, which defines five copies
of the given two byte sequence.
</p>
<p class="smalltext">
The <span class="smallcode">file</span> is a special directive and its syntax is different. This
directive includes a chain of bytes from file and it should be followed by the
quoted file name, then optionally numerical expression specifying offset in
file preceded by the colon, then - also optionally - comma and numerical
expression specifying count of bytes to include (if no count is specified, all
data up to the end of file is included). For example <span class="smallcode">file 'data.bin'</span> will
include the whole file as binary data and <span class="smallcode">file 'data.bin':10h,4</span> will include
only four bytes starting at offset 10h.
</p>
<p class="smalltext">
The data reservation directive should be followed by only one numerical
expression, and this value defines how many cells of the specified size should
be reserved. All data definition directives also accept the <span class="smallcode">?</span> value, which
means that this cell should not be initialized to any value and the effect is
the same as by using the data reservation directive. The uninitialized data
may not be included in the output file, so its values should be always
considered unknown.
</p>
<p class="smalltext">
<b><a name="_1.3">Table 1.3  Data directives</a></b>
</p>
<table class="doctable" style="width: 150px;">
  <tr>
    <th>Size (bytes)</th>
    <th>Define data</th>
    <th>Reserve data</th>
  </tr>
  <tr>
    <td>1</td>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">db</span></td></tr>
        <tr><td><span class="smallcode">file</span></td></tr>
      </table>
    </td>
    <td><span class="smallcode">rb</span></td>
  </tr>
  <tr>
    <td>2</td>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">dw</span></td></tr>
        <tr><td><span class="smallcode">du</span></td></tr>
      </table>
    </td>
    <td><span class="smallcode">rw</span></td>
  </tr>
  <tr>
    <td>4</td>
    <td><span class="smallcode">dd</span></td>
    <td><span class="smallcode">rd</span></td>
  </tr>
  <tr>
    <td>6</td>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">dp</span></td></tr>
        <tr><td><span class="smallcode">df</span></td></tr>
      </table>
    </td>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">rp</span></td></tr>
        <tr><td><span class="smallcode">rf</span></td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>8</td>
    <td><span class="smallcode">dq</span></td>
    <td><span class="smallcode">rq</span></td>
  </tr>
  <tr>
    <td>10</td>
    <td><span class="smallcode">dt</span></td>
    <td><span class="smallcode">rt</span></td>
  </tr>
</table>
<p><b>
<a name="1.2.3" class="smalltext">1.2.3  Constants and labels</a>
</b></p>
<p class="smalltext">
In the numerical expressions you can also use constants or labels instead of
numbers. To define the constant or label you should use the specific
directives. Each label can be defined only once and it is accessible from the
any place of source (even before it was defined). Constant can be redefined
many times, but in this case it is accessible only after it was defined, and
is always equal to the value from last definition before the place where it's
used. When a constant is defined only once in source, it is - like the label -
accessible from anywhere.
</p>
<p class="smalltext">
The definition of constant consists of name of the constant followed by the
<span class="smallcode">=</span> character and numerical expression, which after calculation will become
the value of constant. This value is always calculated at the time the
constant is defined. For example you can define <span class="smallcode">count</span> constant by using the
directive <span class="smallcode">count = 17</span> and then use it in the assembly instructions, like
<span class="smallcode">mov cx,count</span> - which will become <span class="smallcode">mov cx,17</span> during the compilation process.
</p>
<p class="smalltext">
There are different ways to define labels. The simplest is to follow the
name of label by the colon, this directive can even be followed by the other
instruction in the same line. It defines the label whose value is equal to
offset of the point where it's defined. This method is usually used to label
the places in code. The other way is to follow the name of label (without a
colon) by some data directive. It defines the label with value equal to
offset of the beginning of defined data, and remembered as a label for data
with cell size as specified for that data directive in table <a href="#_1.3">1.3</a>.
</p>
<p class="smalltext">
The label can be treated as constant of value equal to offset of labeled
code or data. For example when you define data using the labeled directive
<span class="smallcode">char db 224</span>, to put the offset of this data into BX register you should use
<span class="smallcode">mov bx,char</span> instruction, and to put the value of byte addressed by <span class="smallcode">char</span>
label to DL register, you should use <span class="smallcode">mov dl,[char]</span> (or <span class="smallcode">mov dl,ptr char</span>).
But when you try to assemble <span class="smallcode">mov ax,[char]</span>, it will cause an error, because
fasm compares the sizes of operands, which should be equal. You can force
assembling that instruction by using size override: <span class="smallcode">mov ax,word [char]</span>,
but remember that this instruction will read the two bytes beginning at <span class="smallcode">char</span>
address, while it was defined as a one byte.
</p>
<p class="smalltext">
The last and the most flexible way to define labels is to use <span class="smallcode">label</span>
directive. This directive should be followed by the name of label, then
optionally size operator (it can be preceded by a colon) and then - also
optionally <span class="smallcode">at</span> operator and the numerical expression defining the address at
which this label should be defined. For example <span class="smallcode">label wchar word at char</span>
will define a new label for the 16-bit data at the address of <span class="smallcode">char</span>. Now the
instruction <span class="smallcode">mov ax,[wchar]</span> will be after compilation the same as
<span class="smallcode">mov ax,word [char]</span>. If no address is specified, <span class="smallcode">label</span> directive defines
the label at current offset. Thus <span class="smallcode">mov [wchar],57568</span> will copy two bytes
while <span class="smallcode">mov [char],224</span> will copy one byte to the same address.
</p>
<p class="smalltext">
The label whose name begins with dot is treated as local label, and its name
is attached to the name of last global label (with name beginning with
anything but dot) to make the full name of this label. So you can use the
short name (beginning with dot) of this label anywhere before the next global
label is defined, and in the other places you have to use the full name. Label
beginning with two dots are the exception - they are like global, but they
don't become the new prefix for local labels.
</p>
<p class="smalltext">
The <span class="smallcode">@@</span> name means anonymous label, you can have defined many of them in
the source. Symbol <span class="smallcode">@b</span> (or equivalent <span class="smallcode">@r</span>) references the nearest preceding
anonymous label, symbol <span class="smallcode">@f</span> references the nearest following anonymous label.
These special symbol are case-insensitive.
</p>

<p><b>
<a name="1.2.4" class="smalltext">1.2.4  Numerical expressions</a>
</b></p>
<p class="smalltext">
In the above examples all the numerical expressions were the simple numbers,
constants or labels. But they can be more complex, by using the arithmetical
or logical operators for calculations at compile time. All these operators
with their priority values are listed in table <a href="#_1.4">1.4</a>.
The operations with higher priority value will be calculated first, you can
of course change this behavior by putting some parts of expression into
parenthesis. The <span class="smallcode">+</span>,  <span class="smallcode">-</span>,  <span class="smallcode">*</span> and <span class="smallcode">/</span> are standard arithmetical operations,
<span class="smallcode">mod</span> calculates the remainder from division. The <span class="smallcode">and</span>,  <span class="smallcode">or</span>,        <span class="smallcode">xor</span>,  <span class="smallcode">shl</span>,
<span class="smallcode">shr</span> and <span class="smallcode">not</span> perform the same logical operations as assembly instructions
of those names. The <span class="smallcode">rva</span> performs the conversion of an address into the
relocatable offset and is specific to some of the output formats (see <a href="#2.4">2.4</a>).
</p>
<p class="smalltext">
The numbers in the expression are by default treated as a decimal, binary
numbers should have the <span class="smallcode">b</span> letter attached at the end, octal number should
end with <span class="smallcode">o</span> letter, hexadecimal numbers should begin with <span class="smallcode">0x</span> characters
(like in C language) or with the <span class="smallcode">$</span> character (like in Pascal language) or
they should end with <span class="smallcode">h</span> letter. Also quoted string, when encountered in
expression, will be converted into number - the first character will become
the least significant byte of number.
</p>
<p class="smalltext">
The numerical expression used as an address value can also contain any of
general registers used for addressing, they can be added and multiplied by
appropriate values, as it is allowed for the x86 architecture instructions.
</p>
<p class="smalltext">
There are also some special symbols that can be used inside the numerical
expression. First is <span class="smallcode">$</span>,        which is always equal to the value of current
offset, while <span class="smallcode">$$</span> is equal to base address of current addressing space.
The other one is <span class="smallcode">%</span>,  which is the number of current repeat in parts of code
that are repeated using some special directives (see <a href="#2.2">2.2</a>). There's also <span class="smallcode">%t</span>
symbol, which is always equal to the current time stamp.
</p>
<p class="smalltext">
Any numerical expression can also consist of single floating point value
(flat assembler does not allow any floating point operations at compilation
time) in the scientific notation, they can end with the <span class="smallcode">f</span> letter to be
recognized, otherwise they should contain at least one of the <span class="smallcode">.</span> or <span class="smallcode">E</span>
characters. So <span class="smallcode">1.0</span>,  <span class="smallcode">1E0</span> and <span class="smallcode">1f</span> define the same floating point value,
while simple <span class="smallcode">1</span> defines an integer value.
</p>
<p class="smalltext">
<b><a name="_1.4">Table 1.4  Arithmetical and logical operators by priority</a></b>
</p>
<table class="doctable" style="width: 200px;">
  <tr>
    <th>Priority</th>
    <th>Operators</th>
  </tr>
  <tr>
    <td>0</td>
    <td><span class="smallcode">+  -</span></td>
  </tr>
  <tr>
    <td>1</td>
    <td><span class="smallcode">*  /</span></td>
  </tr>
  <tr>
    <td>2</td>
    <td><span class="smallcode">mod</span></td>
  </tr>
  <tr>
    <td>3</td>
    <td><span class="smallcode">and  or  xor</span></td>
  </tr>
  <tr>
    <td>4</td>
    <td><span class="smallcode">shl  shr</span></td>
  </tr>
  <tr>
    <td>5</td>
    <td><span class="smallcode">not</span></td>
  </tr>
  <tr>
    <td>6</td>
    <td><span class="smallcode">rva</span></td>
  </tr>
</table>

<p><b>
<a name="1.2.5" class="smalltext">1.2.5  Jumps and calls</a>
</b></p>
<p class="smalltext">
The operand of any jump or call instruction can be preceded not only by the
size operator, but also by one of the operators specifying type of the jump:
<span class="smallcode">short</span>, <span class="smallcode">near</span> of <span class="smallcode">far</span>. For example, when assembler is in 16-bit mode,
instruction <span class="smallcode">jmp dword [0]</span> will become the far jump and when assembler is
in 32-bit mode, it will become the near jump. To force this instruction to be
treated differently, use the <span class="smallcode">jmp near dword [0]</span> or <span class="smallcode">jmp far dword [0]</span> form.
</p>
<p class="smalltext">
When operand of near jump is the immediate value, assembler will generate
the shortest variant of this jump instruction if possible (but won't create
32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode,
unless there is a size operator stating it). By specifying the jump type
you can force it to always generate long variant (for example <span class="smallcode">jmp near 0</span>)
or to always generate short variant and terminate with an error when it's
impossible (for example <span class="smallcode">jmp short 0</span>).
</p>

<p><b>
<a name="1.2.6" class="smalltext">1.2.6  Size settings</a>
</b></p>
<p class="smalltext">
When instruction uses some memory addressing, by default the smallest form of
instruction is generated by using the short displacement if only address
value fits in the range. This can be overridden using the <span class="smallcode">word</span>
or <span class="smallcode">dword</span>
operator before the address inside the square brackets (or after the <span class="smallcode">ptr</span>
operator), which forces the long displacement of appropriate size to be made.
In case when address is not relative to any registers, those operators allow
also to choose the appropriate mode of absolute addressing.
</p>
<p class="smalltext">
Instructions <span class="smallcode">adc</span>, <span class="smallcode">add</span>, <span class="smallcode">and</span>, <span class="smallcode">cmp</span>,
<span class="smallcode">or</span>, <span class="smallcode">sbb</span>, <span class="smallcode">sub</span> and <span class="smallcode">xor</span>
with first operand being 16-bit or 32-bit are by default generated in shortened
8-bit form when the second operand is immediate value fitting in the range
for signed 8-bit values. It also can be overridden by putting the <span class="smallcode">word</span> or
<span class="smallcode">dword</span> operator before the immediate value.
The similar rules applies to the <span class="smallcode">imul</span> instruction with the last operand being immediate value.
</p>
<p class="smalltext">
Immediate value as an operand for <span class="smallcode">push</span> instruction without a size operator
is by default treated as a word value if assembler is in 16-bit mode and as a
double word value if assembler is in 32-bit mode, shorter 8-bit form of this
instruction is used if possible, <span class="smallcode">word</span> or <span class="smallcode">dword</span> size operator forces the
<span class="smallcode">push</span> instruction to be generated in longer form for specified size. <span class="smallcode">pushw</span>
and <span class="smallcode">pushd</span> mnemonics force assembler to generate 16-bit or 32-bit code
without forcing it to use the longer form of instruction.
</p>

<p><b>
<span class="mediumtext">Chapter 2</span><br/>
<span class="largetext">Instruction set</span><br/>
</b></p>

<p><b>
<a name="2.1" class="mediumtext">2.1  The x86 architecture instructions</a>
</b></p>

<p class="smalltext">
In this section you can find both the information about the syntax and
purpose the assembly language instructions. If you need more technical
information, look for the Intel Architecture Software Developer's Manual.
</p>
<p class="smalltext">
Assembly instructions consist of the mnemonic (instruction's name) and from
zero to three operands. If there are two or more operands, usually first is
the destination operand and second is the source operand. Each operand can be
register, memory or immediate value (see <a href="#1.2">1.2</a> for details about syntax of
operands). After the description of each instruction there are examples
of different combinations of operands, if the instruction has any.
</p>
<p class="smalltext">
Some instructions act as prefixes and can be followed by other instruction
in the same line, and there can be more than one prefix in a line. Each name
of the segment register is also a mnemonic of instruction prefix, altough it
is recommended to use segment overrides inside the square brackets instead of
these prefixes.
</p>

<p><b>
<a name="2.1.1" class="smalltext">2.1.1  Data movement instructions</a>
</b></p>

<p class="smalltext">
<span class="smallcode">mov</span> transfers a byte, word or double word from the source operand to the
destination operand. It can transfer data between general registers, from
the general register to memory, or from memory to general register, but it
cannot move from memory to memory. It can also transfer an immediate value to
general register or memory, segment register to general register or memory,
general register or memory to segment register, control or debug register to
general register and general register to control or debug register. The <span class="smallcode">mov</span>
can be assembled only if the size of source operand and size of destination
operand are the same. Below are the examples for each of the allowed
combinations:
</p>
<pre class="smallcode">    mov bx,ax       ; general register to general register
    mov [char],al   ; general register to memory
    mov bl,[char]   ; memory to general register
    mov dl,32       ; immediate value to general register
    mov [char],32   ; immediate value to memory
    mov ax,ds       ; segment register to general register
    mov [bx],ds     ; segment register to memory
    mov ds,ax       ; general register to segment register
    mov ds,[bx]     ; memory to segment register
    mov eax,cr0     ; control register to general register
    mov cr3,ebx     ; general register to control register
</pre>
<p class="smalltext">
<span class="smallcode">xchg</span> swaps the contents of two operands. It can swap two byte operands,
two word operands or two double word operands. Order of operands is not
important. The operands may be two general registers, or general register
with memory. For example:
</p>
<pre class="smallcode">    xchg ax,bx      ; swap two general registers
    xchg al,[char]  ; swap register with memory
</pre>
<p class="smalltext">
<span class="smallcode">push</span> decrements the stack frame pointer (ESP register), then transfers
the operand to the top of stack indicated by ESP. The operand can be memory,
general register, segment register or immediate value of word or double word
size. If operand is an immediate value and no size is specified, it is by
default treated as a word value if assembler is in 16-bit mode and as a double
word value if assembler is in 32-bit mode. <span class="smallcode">pushw</span> and <span class="smallcode">pushd</span> mnemonics are
variants of this instruction that store the values of word or double word size
respectively. If more operands follow in the same line (separated only with
spaces, not commas), compiler will assemble chain of the <span class="smallcode">push</span> instructions
with these operands. The examples are with single operands:
</p>
<pre class="smallcode">    push ax         ; store general register
    push es         ; store segment register
    pushw [bx]      ; store memory
    push 1000h      ; store immediate value
</pre>
<p class="smalltext">
<span class="smallcode">pusha</span> saves the contents of the eight general register on the stack.
This instruction has no operands. There are two version of this instruction,
one 16-bit and one 32-bit, assembler automatically generates the appropriate
version for current mode, but it can be overridden by using <span class="smallcode">pushaw</span> or
<span class="smallcode">pushad</span> mnemonic to always get the 16-bit or 32-bit version. The 16-bit
version of this instruction pushes general registers on the stack in the
following order: AX, CX, DX, BX, the initial value of SP before AX was pushed,
BP, SI and DI. The 32-bit version pushes equivalent 32-bit general registers
in the same order.
</p>
<p class="smalltext">
<span class="smallcode">pop</span> transfers the word or double word at the current top of stack to the
destination operand, and then increments ESP to point to the new top of stack.
The operand can be memory, general register or segment register. <span class="smallcode">popw</span> and
<span class="smallcode">popd</span> mnemonics are variants of this instruction for restoring the values of
word or double word size respectively. If more operands separated with spaces
follow in the same line, compiler will assemble chain of the <span class="smallcode">pop</span>
instructions with these operands.
</p>
<pre class="smallcode">    pop bx          ; restore general register
    pop ds          ; restore segment register
    popw [si]       ; restore memory
</pre>
<p class="smalltext">
<span class="smallcode">popa</span> restores the registers saved on the stack by <span class="smallcode">pusha</span> instruction,
except for the saved value of SP (or ESP), which is ignored. This instruction
has no operands. To force assembling 16-bit or 32-bit version of this
instruction use <span class="smallcode">popaw</span> or <span class="smallcode">popad</span> mnemonic.
</p>

<p><b>
<a name="2.1.2" class="smalltext">2.1.2  Type conversion instructions</a>
</b></p>

<p class="smalltext">
The type conversion instructions convert bytes into words, words into double
words, and double words into quad words. These conversions can be done using
the sign extension or zero extension. The sign extension fills the extra bits
of the larger item with the value of the sign bit of the smaller item, the
zero extension simply fills them with zeros.
</p>
<p class="smalltext">
<span class="smallcode">cwd</span> and <span class="smallcode">cdq</span> double the size of value AX or EAX register respectively
and store the extra bits into the DX or EDX register. The conversion is done
using the sign extension. These instructions have no operands.
</p>
<p class="smalltext">
<span class="smallcode">cbw</span> extends the sign of the byte in AL throughout AX, and <span class="smallcode">cwde</span> extends
the sign of the word in AX throughout EAX. These instructions also have no
operands.
</p>
<p class="smalltext">
<span class="smallcode">movsx</span> converts a byte to word or double word and a word to double word
using the sign extension. <span class="smallcode">movzx</span> does the same, but it uses the zero
extension. The source operand can be general register or memory, while the
destination operand must be a general register. For example:
</p>
<pre class="smallcode">    movsx ax,al         ; byte register to word register
    movsx edx,dl        ; byte register to double word register
    movsx eax,ax        ; word register to double word register
    movsx ax,byte [bx]  ; byte memory to word register
    movsx edx,byte [bx] ; byte memory to double word register
    movsx eax,word [bx] ; word memory to double word register
</pre>

<p><b>
<a name="2.1.3" class="smalltext">2.1.3  Binary arithmetic instructions</a>
</b></p>

<p class="smalltext">
<span class="smallcode">add</span> replaces the destination operand with the sum of the source and
destination operands and sets CF if overflow has occurred. The operands may
be bytes, words or double words. The destination operand can be general
register or memory, the source operand can be general register or immediate
value, it can also be memory if the destination operand is register.
</p>
<pre class="smallcode">    add ax,bx       ; add register to register
    add ax,[si]     ; add memory to register
    add [di],al     ; add register to memory
    add al,48       ; add immediate value to register
    add [char],48   ; add immediate value to memory
</pre>
<p class="smalltext">
<span class="smallcode">adc</span> sums the operands, adds one if CF is set, and replaces the destination
operand with the result. Rules for the operands are the same as for the <span class="smallcode">add</span>
instruction. An <span class="smallcode">add</span> followed by multiple <span class="smallcode">adc</span> instructions can be used to
add numbers longer than 32 bits.
</p>
<p class="smalltext">
<span class="smallcode">inc</span> adds one to the operand, it does not affect CF. The operand can be
general register or memory, and the size of the operand can be byte, word or double word.
</p>
<pre class="smallcode">    inc ax          ; increment register by one
    inc byte [bx]   ; increment memory by one
</pre>
<p class="smalltext">
<span class="smallcode">sub</span> subtracts the source operand from the destination operand and replaces
the destination operand with the result. If a borrow is required, the CF is
set. Rules for the operands are the same as for the <span class="smallcode">add</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">sbb</span> subtracts the source operand from the destination operand, subtracts
one if CF is set, and stores the result to the destination operand. Rules for
the operands are the same as for the <span class="smallcode">add</span> instruction. A <span class="smallcode">sub</span> followed by
multiple <span class="smallcode">sbb</span> instructions may be used to subtract numbers longer than 32
bits.
</p>
<p class="smalltext">
<span class="smallcode">dec</span> subtracts one from the operand, it does not affect CF. Rules for the
operand are the same as for the <span class="smallcode">inc</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">cmp</span> subtracts the source operand from the destination operand. It updates
the flags as the <span class="smallcode">sub</span> instruction, but does not alter the source and
destination operands. Rules for the operands are the same as for the <span class="smallcode">sub</span>
instruction.
</p>
<p class="smalltext">
<span class="smallcode">neg</span> subtracts a signed integer operand from zero. The effect of this
instructon is to reverse the sign of the operand from positive to negative or
from negative to positive. Rules for the operand are the same as for the <span class="smallcode">inc</span>
instruction.
</p>
<p class="smalltext">
<span class="smallcode">xadd</span> exchanges the destination operand with the source operand, then loads
the sum of the two values into the destination operand. Rules for the operands
are the same as for the <span class="smallcode">add</span> instruction.
</p>
<p class="smalltext">
All the above binary arithmetic instructions update SF, ZF, PF and OF flags.
SF is always set to the same value as the result's sign bit, ZF is set when
all the bits of result are zero, PF is set when low order eight bits of result
contain an even number of set bits, OF is set if result is too large for a
positive number or too small for a negative number (excluding sign bit) to fit
in destination operand.
</p>
<p class="smalltext">
<span class="smallcode">mul</span> performs an unsigned multiplication of the operand and the
accumulator. If the operand is a byte, the processor multiplies it by the
contents of AL and returns the 16-bit result to AH and AL. If the operand is a
word, the processor multiplies it by the contents of AX and returns the 32-bit
result to DX and AX. If the operand is a double word, the processor multiplies
it by the contents of EAX and returns the 64-bit result in EDX and EAX. <span class="smallcode">mul</span>
sets CF and OF when the upper half of the result is nonzero, otherwise they
are cleared. Rules for the operand are the same as for the <span class="smallcode">inc</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">imul</span> performs a signed multiplication operation. This instruction has
three variations. First has one operand and behaves in the same way as the
<span class="smallcode">mul</span> instruction. Second has two operands, in this case destination operand
is multiplied by the source operand and the result replaces the destination
operand. Destination operand must be a general register, it can be word or
double word, source operand can be general register, memory or immediate
value. Third form has three operands, the destination operand must be a general register,
word or double word in size, source operand can be general register or memory, and
third operand must be an immediate value. The source operand is multiplied by
the immediate value and the result is stored in the destination register.
All the three forms calculate the product to twice the size of operands and
set CF and OF when the upper half of the result is nonzero, but second and
third form truncate the product to the size of operands. So second and third
forms can be also used for unsigned operands because, whether the operands
are signed or unsigned, the lower half of the product is the same.
Below are the examples for all three forms:
</p>
<pre class="smallcode">    imul bl         ; accumulator by register
    imul word [si]  ; accumulator by memory
    imul bx,cx      ; register by register
    imul bx,[si]    ; register by memory
    imul bx,10      ; register by immediate value
    imul ax,bx,10   ; register by immediate value to register
    imul ax,[si],10 ; memory by immediate value to register
</pre>
<p class="smalltext">
<span class="smallcode">div</span> performs an unsigned division of the accumulator by the operand.
The dividend (the accumulator) is twice the size of the divisor (the operand),
the quotient and remainder have the same size as the divisor. If divisor is
byte, the dividend is taken from AX register, the quotient is stored in AL and
the remainder is stored in AH. If divisor is word, the upper half of dividend
is taken from DX, the lower half of dividend is taken from AX, the quotient is
stored in AX and the remainder is stored in DX. If divisor is double word,
the upper half of dividend is taken from EDX, the lower half of dividend is
taken from EAX, the quotient is stored in EAX and the remainder is stored in
EDX. Rules for the operand are the same as for the <span class="smallcode">mul</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">idiv</span> performs a signed division of the accumulator by the operand.
It uses the same registers as the <span class="smallcode">div</span> instruction, and the rules for
the operand are the same.
</p>

<p><b>
<a name="2.1.4" class="smalltext">2.1.4  Decimal arithmetic instructions</a>
</b></p>

<p class="smalltext">
Decimal arithmetic is performed by combining the binary arithmetic
instructions (already described in the prior section) with the decimal
arithmetic instructions. The decimal arithmetic instructions are used to
adjust the results of a previous binary arithmetic operation to produce a
valid packed or unpacked decimal result, or to adjust the inputs to a
subsequent binary arithmetic operation so the operation will produce a valid
packed or unpacked decimal result.
</p>
<p class="smalltext">
<span class="smallcode">daa</span> adjusts the result of adding two valid packed decimal operands in
AL. <span class="smallcode">daa</span> must always follow the addition of two pairs of packed decimal
numbers (one digit in each half-byte) to obtain a pair of valid packed
decimal digits as results. The carry flag is set if carry was needed.
This instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">das</span> adjusts the result of subtracting two valid packed decimal operands
in AL. <span class="smallcode">das</span> must always follow the subtraction of one pair of packed decimal
numbers (one digit in each half-byte) from another to obtain a pair of valid
packed decimal digits as results. The carry flag is set if a borrow was
needed. This instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">aaa</span> changes the contents of register AL to a valid unpacked decimal
number, and zeroes the top four bits. <span class="smallcode">aaa</span> must always follow the addition
of two unpacked decimal operands in AL. The carry flag is set and AH is
incremented if a carry is necessary. This instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">aas</span> changes the contents of register AL to a valid unpacked decimal
number, and zeroes the top four bits. <span class="smallcode">aas</span> must always follow the
subtraction of one unpacked decimal operand from another in AL. The carry flag
is set and AH decremented if a borrow is necessary. This instruction has no
operands.
</p>
<p class="smalltext">
<span class="smallcode">aam</span> corrects the result of a multiplication of two valid unpacked decimal
numbers. <span class="smallcode">aam</span> must always follow the multiplication of two decimal numbers
to produce a valid decimal result. The high order digit is left in AH, the
low order digit in AL. The generalized version of this instruction allows
adjustment of the contents of the AX to create two unpacked digits of any
number base. The standard version of this instruction has no operands, the
generalized version has one operand - an immediate value specifying the
number base for the created digits.
</p>
<p class="smalltext">
<span class="smallcode">aad</span> modifies the numerator in AH and AL to prepare for the division of two
valid unpacked decimal operands so that the quotient produced by the division
will be a valid unpacked decimal number. AH should contain the high order
digit and AL the low order digit. This instruction adjusts the value and
places the result in AL, while AH will contain zero. The generalized version
of this instruction allows adjustment of two unpacked digits of any number
base. Rules for the operand are the same as for the <span class="smallcode">aam</span> instruction.
</p>

<p><b>
<a name="2.1.5" class="smalltext">2.1.5  Logical instructions</a>
</b></p>

<p class="smalltext">
<span class="smallcode">not</span> inverts the bits in the specified operand to form a one's
complement of the operand. It has no effect on the flags. Rules for the
operand are the same as for the <span class="smallcode">inc</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">and</span>, <span class="smallcode">or</span> and <span class="smallcode">xor</span> instructions perform the standard
logical operations. They update the SF, ZF and PF flags. Rules for the
operands are the same as for the <span class="smallcode">add</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">bt</span>, <span class="smallcode">bts</span>, <span class="smallcode">btr</span> and <span class="smallcode">btc</span> instructions operate on a single bit which can
be in memory or in a general register. The location of the bit is specified
as an offset from the low order end of the operand. The value of the offset
is the taken from the second operand, it either may be an immediate byte or
a general register. These instructions first assign the value of the selected
bit to CF. <span class="smallcode">bt</span> instruction does nothing more, <span class="smallcode">bts</span> sets the selected bit to
1, <span class="smallcode">btr</span> resets the selected bit to 0, <span class="smallcode">btc</span> changes the bit to its
complement. The first operand can be word or double word.
</p>
<pre class="smallcode">    bt  ax,15        ; test bit in register
    bts word [bx],15 ; test and set bit in memory
    btr ax,cx        ; test and reset bit in register
    btc word [bx],cx ; test and complement bit in memory
</pre>
<p class="smalltext">
<span class="smallcode">bsf</span> and <span class="smallcode">bsr</span> instructions scan a word or double word for first set bit
and store the index of this bit into destination operand, which must be
general register. The bit string being scanned is specified by source operand,
it may be either general register or memory. The ZF flag is set if the entire
string is zero (no set bits are found); otherwise it is cleared. If no set bit
is found, the value of the destination register is undefined. <span class="smallcode">bsf</span> scans from
low order to high order (starting from bit index zero). <span class="smallcode">bsr</span> scans from high
order to low order (starting from bit index 15 of a word or index 31 of a
double word).
</p>
<pre class="smallcode">    bsf ax,bx        ; scan register forward
    bsr ax,[si]      ; scan memory reverse
</pre>
<p class="smalltext">
<span class="smallcode">shl</span> shifts the destination operand left by the number of bits specified
in the second operand. The destination operand can be byte, word, or double
word general register or memory. The second operand can be an immediate value
or the CL register. The processor shifts zeros in from the right (low order)
side of the operand as bits exit from the left side. The last bit that exited
is stored in CF. <span class="smallcode">sal</span> is a synonym for <span class="smallcode">shl</span>.
</p>
<pre class="smallcode">    shl al,1         ; shift register left by one bit
    shl byte [bx],1  ; shift memory left by one bit
    shl ax,cl        ; shift register left by count from cl
    shl word [bx],cl ; shift memory left by count from cl
</pre>
<p class="smalltext">
<span class="smallcode">shr</span> and <span class="smallcode">sar</span> shift the destination operand right by the number of bits
specified in the second operand. Rules for operands are the same as for the
<span class="smallcode">shl</span> instruction. <span class="smallcode">shr</span> shifts zeros in from the left side of the operand as
bits exit from the right side. The last bit that exited is stored in CF.
<span class="smallcode">sar</span> preserves the sign of the operand by shifting in zeros on the left side
if the value is positive or by shifting in ones if the value is negative.
</p>
<p class="smalltext">
<span class="smallcode">shld</span> shifts bits of the destination operand to the left by the number
of bits specified in third operand, while shifting high order bits from the
source operand into the destination operand on the right. The source operand
remains unmodified. The destination operand can be a word or double word
general register or memory, the source operand must be a general register,
third operand can be an immediate value or the CL register.
</p>
<pre class="smallcode">    shld ax,bx,1     ; shift register left by one bit
    shld [di],bx,1   ; shift memory left by one bit
    shld ax,bx,cl    ; shift register left by count from cl
    shld [di],bx,cl  ; shift memory left by count from cl
</pre>
<p class="smalltext">
<span class="smallcode">shrd</span> shifts bits of the destination operand to the right, while shifting
low order bits from the source operand into the destination operand on the
left. The source operand remains unmodified. Rules for operands are the same
as for the <span class="smallcode">shld</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">rol</span> and <span class="smallcode">rcl</span> rotate the byte, word or double word destination operand
left by the number of bits specified in the second operand. For each rotation
specified, the high order bit that exits from the left of the operand returns
at the right to become the new low order bit. <span class="smallcode">rcl</span> additionally puts in CF
each high order bit that exits from the left side of the operand before it
returns to the operand as the low order bit on the next rotation cycle. Rules
for operands are the same as for the <span class="smallcode">shl</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">ror</span> and <span class="smallcode">rcr</span> rotate the byte, word or double word destination operand
right by the number of bits specified in the second operand. For each rotation
specified, the low order bit that exits from the right of the operand returns
at the left to become the new high order bit. <span class="smallcode">rcr</span> additionally puts in CF
each low order bit that exits from the right side of the operand before it
returns to the operand as the high order bit on the next rotation cycle.
Rules for operands are the same as for the <span class="smallcode">shl</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">test</span> performs the same action as the <span class="smallcode">and</span> instruction, but it does not
alter the destination operand, only updates flags. Rules for the operands are
the same as for the <span class="smallcode">and</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">bswap</span> reverses the byte order of a 32-bit general register: bits 0 through
7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with
bits 16 through 23. This instruction is provided for converting little-endian
values to big-endian format and vice versa.
</p>
<pre class="smallcode">    bswap edx        ; swap bytes in register
</pre>

<p><b>
<a name="2.1.6" class="smalltext">2.1.6  Control transfer instructions</a>
</b></p>

<p class="smalltext">
<span class="smallcode">jmp</span> unconditionally transfers control to the target location. The
destination address can be specified directly within the instruction or
indirectly through a register or memory, the acceptable size of this address
depends on whether the jump is near or far (it can be specified by preceding
the operand with <span class="smallcode">near</span> or <span class="smallcode">far</span> operator) and whether the instruction is
16-bit or 32-bit. Operand for near jump should be <span class="smallcode">word</span> size for 16-bit
instruction or the <span class="smallcode">dword</span> size for 32-bit instruction. Operand for far jump
should be <span class="smallcode">dword</span> size for 16-bit instruction or <span class="smallcode">pword</span> size for 32-bit
instruction. A direct <span class="smallcode">jmp</span> instruction includes the destination address as
part of the instruction (and can be preceded by <span class="smallcode">short</span>, <span class="smallcode">near</span> or <span class="smallcode">far</span>
operator), the operand specifying address should be the numerical expression
for near or short jump, or two numerical expressions separated with colon for
far jump, the first specifies selector of segment, the second is the offset
within segment. The <span class="smallcode">pword</span> operator can be used to force the 32-bit far call,
and <span class="smallcode">dword</span> to force the 16-bit far call. An indirect <span class="smallcode">jmp</span> instruction
obtains the destination address indirectly through a register or a pointer
variable, the operand should be general register or memory. See also 1.2.5 for
some more details.
</p>
<pre class="smallcode">    jmp 100h         ; direct near jump
    jmp 0FFFFh:0     ; direct far jump
    jmp ax           ; indirect near jump
    jmp pword [ebx]  ; indirect far jump
</pre>
<p class="smalltext">
<span class="smallcode">call</span> transfers control to the procedure, saving on the stack the address
of the instruction following the <span class="smallcode">call</span> for later use by a <span class="smallcode">ret</span> (return)
instruction. Rules for the operands are the same as for the <span class="smallcode">jmp</span> instruction,
but the <span class="smallcode">call</span> has no short variant of direct instruction and thus it not
optimized.
</p>
<p class="smalltext">
<span class="smallcode">ret</span>, <span class="smallcode">retn</span> and <span class="smallcode">retf</span> instructions terminate the execution of a procedure
and transfers control back to the program that originally invoked the
procedure using the address that was stored on the stack by the <span class="smallcode">call</span>
instruction. <span class="smallcode">ret</span> is the equivalent for <span class="smallcode">retn</span>, which returns from the
procedure that was executed using the near call, while <span class="smallcode">retf</span> returns from
the procedure that was executed using the far call. These instructions default
to the size of address appropriate for the current code setting, but the size
of address can be forced to 16-bit by using the <span class="smallcode">retw</span>, <span class="smallcode">retnw</span> and <span class="smallcode">retfw</span>
mnemonics, and to 32-bit by using the <span class="smallcode">retd</span>, <span class="smallcode">retnd</span> and <span class="smallcode">retfd</span> mnemonics.
All these instructions may optionally specify an immediate operand, by adding
this constant to the stack pointer, they effectively remove any arguments that
the calling program pushed on the stack before the execution of the <span class="smallcode">call</span>
instruction.
</p>
<p class="smalltext">
<span class="smallcode">iret</span> returns control to an interrupted procedure. It differs from <span class="smallcode">ret</span> in
that it also pops the flags from the stack into the flags register. The flags
are stored on the stack by the interrupt mechanism. It defaults to the size of
return address appropriate for the current code setting, but it can be forced
to use 16-bit or 32-bit address by using the <span class="smallcode">iretw</span> or <span class="smallcode">iretd</span> mnemonic.
</p>
<p class="smalltext">
The conditional transfer instructions are jumps that may or may not transfer
control, depending on the state of the CPU flags when the instruction
executes. The mnemonics for conditional jumps may be obtained by attaching
the condition mnemonic (see table <a href="#_2.1">2.1</a>) to the <span class="smallcode">j</span> mnemonic,
for example <span class="smallcode">jc</span> instruction will transfer the control when the CF flag is
set. The conditional jumps can be short or near, and direct only, and can be optimized
(see <a href="#1.2.5">1.2.5</a>), the operand should be an immediate value specifying target
address.
</p>
<p class="smalltext">
<b><a name="_2.1">Table 2.1  Conditions</a></b>
</p>
<table class="doctable" style="width: 400px;">
  <tr>
    <th>Mnemonic</th>
    <th>Condition tested</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><span class="smallcode">o</span></td>
    <td>OF = 1</td>
    <td>overflow</td>
  </tr>
  <tr>
    <td><span class="smallcode">no</span></td>
    <td>OF = 0</td>
    <td>not overflow</td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">c</span></td></tr>
        <tr><td><span class="smallcode">b</span></td></tr>
        <tr><td><span class="smallcode">nae</span></td></tr>
      </table>
    </td>
    <td>CF = 1</td>
    <td>
      <table class="intable">
        <tr><td>carry</td></tr>
        <tr><td>below</td></tr>
        <tr><td>not above nor equal</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">nc</span></td></tr>
        <tr><td><span class="smallcode">ae</span></td></tr>
        <tr><td><span class="smallcode">nb</span></td></tr>
      </table>
    </td>
    <td>CF = 0</td>
    <td>
      <table class="intable">
        <tr><td>not carry</td></tr>
        <tr><td>above or equal</td></tr>
        <tr><td>not below</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">e</span></td></tr>
        <tr><td><span class="smallcode">z</span></td></tr>
      </table>
    </td>
    <td>ZF = 1</td>
    <td>
      <table class="intable">
        <tr><td>equal</td></tr>
        <tr><td>zero</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">ne</span></td></tr>
        <tr><td><span class="smallcode">nz</span></td></tr>
      </table>
    </td>
    <td>ZF = 0</td>
    <td>
      <table class="intable">
        <tr><td>not equal</td></tr>
        <tr><td>not zero</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">be</span></td></tr>
        <tr><td><span class="smallcode">na</span></td></tr>
      </table>
    </td>
    <td>CF <span class="smallcode">or</span> ZF = 1</td>
    <td>
      <table class="intable">
        <tr><td>below or equal</td></tr>
        <tr><td>not above</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">a</span></td></tr>
        <tr><td><span class="smallcode">nbe</span></td></tr>
      </table>
    </td>
    <td>CF <span class="smallcode">or</span> ZF = 0</td>
    <td>
      <table class="intable">
        <tr><td>above</td></tr>
        <tr><td>not below nor equal</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td><span class="smallcode">s</span></td>
    <td>SF = 1</td>
    <td>sign</td>
  </tr>
  <tr>
    <td><span class="smallcode">ns</span></td>
    <td>SF = 0</td>
    <td>not sign</td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">p</span></td></tr>
        <tr><td><span class="smallcode">pe</span></td></tr>
      </table>
    </td>
    <td>PF = 1</td>
    <td>
      <table class="intable">
        <tr><td>parity</td></tr>
        <tr><td>parity even</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">np</span></td></tr>
        <tr><td><span class="smallcode">po</span></td></tr>
      </table>
    </td>
    <td>PF = 0</td>
    <td>
      <table class="intable">
        <tr><td>not parity</td></tr>
        <tr><td>parity odd</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">l</span></td></tr>
        <tr><td><span class="smallcode">nge</span></td></tr>
      </table>
    </td>
    <td>SF <span class="smallcode">xor</span> OF = 1</td>
    <td>
      <table class="intable">
        <tr><td>less</td></tr>
        <tr><td>not greater nor equal</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">ge</span></td></tr>
        <tr><td><span class="smallcode">nl</span></td></tr>
      </table>
    </td>
    <td>SF <span class="smallcode">xor</span> OF = 0</td>
    <td>
      <table class="intable">
        <tr><td>greater or equal</td></tr>
        <tr><td>not less</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">le</span></td></tr>
        <tr><td><span class="smallcode">ng</span></td></tr>
      </table>
    </td>
    <td>(SF <span class="smallcode">xor</span> OF) <span class="smallcode">or</span> ZF = 1</td>
    <td>
      <table class="intable">
        <tr><td>less or equal</td></tr>
        <tr><td>not greater</td></tr>
      </table>
    </td>
  </tr>
  <tr>
    <td>
      <table class="intable">
        <tr><td><span class="smallcode">g</span></td></tr>
        <tr><td><span class="smallcode">nle</span></td></tr>
      </table>
    </td>
    <td>(SF <span class="smallcode">xor</span> OF) <span class="smallcode">or</span> ZF = 0</td>
    <td>
      <table class="intable">
        <tr><td>greater</td></tr>
        <tr><td>not less nor equal</td></tr>
      </table>
    </td>
  </tr>
</table>
<p class="smalltext">
The <span class="smallcode">loop</span> instructions are conditional jumps that use a value placed in
CX (or ECX) to specify the number of repetitions of a software loop. All
<span class="smallcode">loop</span> instructions automatically decrement CX (or ECX) and terminate the
loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX
whether the current code setting is 16-bit or 32-bit, but it can be forced to
us CX with the <span class="smallcode">loopw</span> mnemonic or to use ECX with the <span class="smallcode">loopd</span> mnemonic.
<span class="smallcode">loope</span> and <span class="smallcode">loopz</span> are the synonyms for the same instruction, which acts as
the standard <span class="smallcode">loop</span>, but also terminates the loop when ZF flag is set.
<span class="smallcode">loopew</span> and <span class="smallcode">loopzw</span> mnemonics force them to use CX register while <span class="smallcode">looped</span>
and <span class="smallcode">loopzd</span> force them to use ECX register. <span class="smallcode">loopne</span> and <span class="smallcode">loopnz</span> are the
synonyms for the same instructions, which acts as the standard <span class="smallcode">loop</span>, but
also terminate the loop when ZF flag is not set. <span class="smallcode">loopnew</span> and <span class="smallcode">loopnzw</span>
mnemonics force them to use CX register while <span class="smallcode">loopned</span> and <span class="smallcode">loopnzd</span> force
them to use ECX register. Every <span class="smallcode">loop</span> instruction needs an operand being an
immediate value specifying target address, it can be only short jump (in the
range of 128 bytes back and 127 bytes forward from the address of instruction
following the <span class="smallcode">loop</span> instruction).
</p>
<p class="smalltext">
<span class="smallcode">jcxz</span> branches to the label specified in the instruction if it finds a
value of zero in CX, <span class="smallcode">jecxz</span> does the same, but checks the value of ECX
instead of CX. Rules for the operands are the same as for the <span class="smallcode">loop</span>
instruction.
</p>
<p class="smalltext">
<span class="smallcode">int</span> activates the interrupt service routine that corresponds to the
number specified as an operand to the instruction, the number should be in
range from 0 to 255. The interrupt service routine terminates with an <span class="smallcode">iret</span>
instruction that returns control to the instruction that follows <span class="smallcode">int</span>.
<span class="smallcode">int3</span> mnemonic codes the short (one byte) trap that invokes the interrupt 3.
<span class="smallcode">into</span> instruction invokes the interrupt 4 if the OF flag is set.
</p>
<p class="smalltext">
<span class="smallcode">bound</span> verifies that the signed value contained in the specified register
lies within specified limits. An interrupt 5 occurs if the value contained in
the register is less than the lower bound or greater than the upper bound. It
needs two operands, the first operand specifies the register being tested,
the second operand should be memory address for the two signed limit values.
The operands can be <span class="smallcode">word</span> or <span class="smallcode">dword</span> in size.
</p>
<pre class="smallcode">    bound ax,[bx]    ; check word for bounds
    bound eax,[esi]  ; check double word for bounds
</pre>

<p><b>
<a name="2.1.7" class="smalltext">2.1.7  I/O instructions</a>
</b></p>

<p class="smalltext">
<span class="smallcode">in</span> transfers a byte, word, or double word from an input port to AL, AX,
or EAX. I/O ports can be addressed either directly, with the immediate byte
value coded in instruction, or indirectly via the DX register. The destination
operand should be AL, AX, or EAX register. The source operand should be an
immediate value in range from 0 to 255, or DX register.
</p>
<pre class="smallcode">    in al,20h        ; input byte from port 20h
    in ax,dx         ; input word from port addressed by dx
</pre>
<p class="smalltext">
<span class="smallcode">out</span> transfers a byte, word, or double word to an output port from AL, AX,
or EAX. The program can specify the number of the port using the same methods
as the <span class="smallcode">in</span> instruction. The destination operand should be an immediate value
in range from 0 to 255, or DX register. The source operand should be AL, AX,
or EAX register.
</p>
<pre class="smallcode">    out 20h,ax       ; output word to port 20h
    out dx,al        ; output byte to port addressed by dx
</pre>

<p><b>
<a name="2.1.8" class="smalltext">2.1.8  Strings operations</a>
</b></p>

<p class="smalltext">
The string operations operate on one element of a string. A string element
may be a byte, a word, or a double word. The string elements are addressed by
SI and DI (or ESI and EDI) registers. After every string operation SI and/or
DI (or ESI and/or EDI) are automatically updated to point to the next element
of the string. If DF (direction flag) is zero, the index registers are
incremented, if DF is one, they are decremented. The amount of the increment
or decrement is 1, 2, or 4 depending on the size of the string element. Every
string operation instruction has short forms which have no operands and use
SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code
type is 32-bit. SI and ESI by default address data in the segment selected
by DS, DI and EDI always address data in the segment selected by ES. Short
form is obtained by attaching to the mnemonic of string operation letter
specifying the size of string element, it should be <span class="smallcode">b</span> for byte element,
<span class="smallcode">w</span> for word element, and <span class="smallcode">d</span> for double word element. Full form of string
operation needs operands providing the size operator and the memory addresses,
which can be SI or ESI with any segment prefix, DI or EDI always with ES
segment prefix.
</p>
<p class="smalltext">
<span class="smallcode">movs</span> transfers the string element pointed to by SI (or ESI) to the
location pointed to by DI (or EDI). Size of operands can be byte, word, or
double word. The destination operand should be memory addressed by DI or EDI,
the source operand should be memory addressed by SI or ESI with any segment
prefix.
</p>
<pre class="smallcode">    movs byte [di],[si]        ; transfer byte
    movs word [es:di],[ss:si]  ; transfer word
    movsd                      ; transfer double word
</pre>
<p class="smalltext">
<span class="smallcode">cmps</span> subtracts the destination string element from the source string
element and updates the flags AF, SF, PF, CF and OF, but it does not change
any of the compared elements. If the string elements are equal, ZF is set,
otherwise it is cleared. The first operand for this instruction should be the
source string element addressed by SI or ESI with any segment prefix, the
second operand should be the destination string element addressed by DI or
EDI.
</p>
<pre class="smallcode">    cmpsb                      ; compare bytes
    cmps word [ds:si],[es:di]  ; compare words
    cmps dword [fs:esi],[edi]  ; compare double words
</pre>
<p class="smalltext">
<span class="smallcode">scas</span> subtracts the destination string element from AL, AX, or EAX
(depending on the size of string element) and updates the flags AF, SF, ZF,
PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
The operand should be the destination string element addressed by DI or EDI.
</p>
<pre class="smallcode">    scas byte [es:di]          ; scan byte
    scasw                      ; scan word
    scas dword [es:edi]        ; scan double word
</pre>
<p class="smalltext">
<span class="smallcode">stos</span> places the value of AL, AX, or EAX into the destination string
element. Rules for the operand are the same as for the <span class="smallcode">scas</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">lods</span> places the source string element into AL, AX, or EAX. The operand
should be the source string element addressed by SI or ESI with any segment
prefix.
</p>
<pre class="smallcode">    lods byte [ds:si]           ; load byte
    lods word [cs:si]           ; load word
    lodsd                       ; load double word
</pre>
<p class="smalltext">
<span class="smallcode">ins</span> transfers a byte, word, or double word from an input port addressed
by DX register to the destination string element. The destination operand
should be memory addressed by DI or EDI, the source operand should be the DX
register.
</p>
<pre class="smallcode">    insb                       ; input byte
    ins word [es:di],dx        ; input word
    ins dword [edi],dx         ; input double word
</pre>
<p class="smalltext">
<span class="smallcode">outs</span> transfers the source string element to an output port addressed by
DX register. The destination operand should be the DX register and the source
operand should be memory addressed by SI or ESI with any segment prefix.
</p>
<pre class="smallcode">    outs dx,byte [si]          ; output byte
    outsw                      ; output word
    outs dx,dword [gs:esi]     ; output double word
</pre>
<p class="smalltext">
The repeat prefixes <span class="smallcode">rep</span>, <span class="smallcode">repe</span>/<span class="smallcode">repz</span>, and <span class="smallcode">repne</span>/<span class="smallcode">repnz</span> specify
repeated string operation. When a string operation instruction has a repeat
prefix, the operation is executed repeatedly, each time using a different
element of the string. The repetition terminates when one of the conditions
specified by the prefix is satisfied. All three prefixes automatically
decrease CX or ECX register (depending whether string operation instruction
uses the 16-bit or 32-bit addressing) after each operation and repeat the
associated operation until CX or ECX is zero. <span class="smallcode">repe</span>/<span class="smallcode">repz</span> and
<span class="smallcode">repne</span>/<span class="smallcode">repnz</span> are used exclusively with the <span class="smallcode">scas</span> and <span class="smallcode">cmps</span> instructions
(described below). When these prefixes are used, repetition of the next
instruction depends on the zero flag (ZF) also, <span class="smallcode">repe</span> and <span class="smallcode">repz</span> terminate
the execution when the ZF is zero, <span class="smallcode">repne</span> and <span class="smallcode">repnz</span> terminate the execution
when the ZF is set.
</p>
<pre class="smallcode">    rep  movsd       ; transfer multiple double words
    repe cmpsb       ; compare bytes until not equal
</pre>

<p><b>
<a name="2.1.9" class="smalltext">2.1.9  Flag control instructions</a>
</b></p>

<p class="smalltext">
The flag control instructions provide a method for directly changing the
state of bits in the flag register. All instructions described in this
section have no operands.
</p>
<p class="smalltext">
<span class="smallcode">stc</span> sets the CF (carry flag) to 1, <span class="smallcode">clc</span> zeroes the CF, <span class="smallcode">cmc</span> changes the
CF to its complement. <span class="smallcode">std</span> sets the DF (direction flag) to 1, <span class="smallcode">cld</span> zeroes
the DF, <span class="smallcode">sti</span> sets the IF (interrupt flag) to 1 and therefore enables the
interrupts, <span class="smallcode">cli</span> zeroes the IF and therefore disables the interrupts.
</p>
<p class="smalltext">
<span class="smallcode">lahf</span> copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
AH register. The contents of the remaining bits are undefined. The flags
remain unaffected.
</p>
<p class="smalltext">
<span class="smallcode">sahf</span> transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF,
AF, PF, and CF.
</p>
<p class="smalltext">
<span class="smallcode">pushf</span> decrements <span class="smallcode">esp</span> by two or four and stores the low word or
double word of flags register at the top of stack, size of stored data
depends on the current code setting. <span class="smallcode">pushfw</span> variant forces storing the
word and <span class="smallcode">pushfd</span> forces storing the double word.
</p>
<p class="smalltext">
<span class="smallcode">popf</span> transfers specific bits from the word or double word at the top
of stack, then increments <span class="smallcode">esp</span> by two or four, this value depends on
the current code setting. <span class="smallcode">popfw</span> variant forces restoring from the word
and <span class="smallcode">popfd</span> forces restoring from the double word.
</p>

<p><b>
<a name="2.1.10" class="smalltext">2.1.10  Conditional operations</a>
</b></p>

<p class="smalltext">
The instructions obtained by attaching the condition mnemonic (see table <a href="#_2.1">2.1</a>)
to the <span class="smallcode">set</span> mnemonic set a byte to one if the condition is true and set
the byte to zero otherwise. The operand should be an 8-bit be general register
or the byte in memory.
</p>
<pre class="smallcode">    setne al         ; set al if zero flag cleared
    seto byte [bx]   ; set byte if overflow
</pre>
<p class="smalltext">
<span class="smallcode">salc</span> instruction sets the all bits of AL register when the carry flag is
set and zeroes the AL register otherwise. This instruction has no arguments.
</p>
<p class="smalltext">
The instructions obtained by attaching the condition mnemonic to the <span class="smallcode">cmov</span>
mnemonic transfer the word or double word from the general register or memory
to the general register only when the condition is true. The destination
operand should be general register, the source operand can be general register
or memory.
</p>
<pre class="smallcode">    cmove ax,bx      ; move when zero flag set
    cmovnc eax,[ebx] ; move when carry flag cleared
</pre>
<p class="smalltext">
<span class="smallcode">cmpxchg</span> compares the value in the AL, AX, or EAX register with the
destination operand. If the two values are equal, the source operand is
loaded into the destination operand. Otherwise, the destination operand is
loaded into the AL, AX, or EAX register. The destination operand may be a
general register or memory, the source operand must be a general register.
</p>
<pre class="smallcode">    cmpxchg dl,bl    ; compare and exchange with register
    cmpxchg [bx],dx  ; compare and exchange with memory
</pre>
<p class="smalltext">
<span class="smallcode">cmpxchg8b</span> compares the 64-bit value in EDX and EAX registers with the
destination operand. If the values are equal, the 64-bit value in ECX and EBX
registers is stored in the destination operand. Otherwise, the value in the
destination operand is loaded into EDX and EAX registers. The destination
operand should be a quad word in memory.
</p>
<pre class="smallcode">    cmpxchg8b [bx]   ; compare and exchange 8 bytes
</pre>

<p><b>
<a name="2.1.11" class="smalltext">2.1.11  Miscellaneous instructions</a>
</b></p>

<p class="smalltext">
<span class="smallcode">nop</span> instruction occupies one byte but affects nothing but the instruction
pointer. This instruction has no operands and doesn't perform any operation.
</p>
<p class="smalltext">
<span class="smallcode">ud2</span> instruction generates an invalid opcode exception. This instruction
is provided for software testing to explicitly generate an invalid opcode.
This is instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">xlat</span> replaces a byte in the AL register with a byte indexed by its value
in a translation table addressed by BX or EBX. The operand should be a byte
memory addressed by BX or EBX with any segment prefix. This instruction has
also a short form <span class="smallcode">xlatb</span> which has no operands and uses the BX or EBX address
in the segment selected by DS depending on the current code setting.
</p>
<p class="smalltext">
<span class="smallcode">lds</span> transfers a pointer variable from the source operand to DS and the
destination register. The source operand must be a memory operand, and the
destination operand must be a general register. The DS register receives the
segment selector of the pointer while the destination register receives the
offset part of the pointer. <span class="smallcode">les</span>, <span class="smallcode">lfs</span>, <span class="smallcode">lgs</span> and <span class="smallcode">lss</span> operate identically
to <span class="smallcode">lds</span> except that rather than DS register the ES, FS, GS and SS is used
respectively.
</p>
<pre class="smallcode">    lds bx,[si]      ; load pointer to ds:bx
</pre>
<p class="smalltext">
<span class="smallcode">lea</span> transfers the offset of the source operand (rather than its value)
to the destination operand. The source operand must be a memory operand, and
the destination operand must be a general register.
</p>
<pre class="smallcode">    lea dx,[bx+si+1] ; load effective address to dx
</pre>
<p class="smalltext">
<span class="smallcode">cpuid</span> returns processor identification and feature information in the
EAX, EBX, ECX, and EDX registers. The information returned is selected by
entering a value in the EAX register before the instruction is executed.
This instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">pause</span> instruction delays the execution of the next instruction an
implementation specific amount of time. It can be used to improve the
performance of spin wait loops. This instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">enter</span> creates a stack frame that may be used to implement the scope rules
of block-structured high-level languages. A <span class="smallcode">leave</span> instruction at the end of
a procedure complements an <span class="smallcode">enter</span> at the beginning of the procedure to
simplify stack management and to control access to variables for nested
procedures. The <span class="smallcode">enter</span> instruction includes two parameters. The first
parameter specifies the number of bytes of dynamic storage to be allocated on
the stack for the routine being entered. The second parameter corresponds to
the lexical nesting level of the routine, it can be in range from 0 to 31.
The specified lexical level determines how many sets of stack frame pointers
the CPU copies into the new stack frame from the preceding frame. This list
of stack frame pointers is sometimes called the display. The first word (or
double word when code is 32-bit) of the display is a pointer to the last stack
frame. This pointer enables a <span class="smallcode">leave</span> instruction to reverse the action of the
previous <span class="smallcode">enter</span> instruction by effectively discarding the last stack frame.
After <span class="smallcode">enter</span> creates the new display for a procedure, it allocates the
dynamic storage space for that procedure by decrementing ESP by the number of
bytes specified in the first parameter. To enable a procedure to address its
display, <span class="smallcode">enter</span> leaves BP (or EBP) pointing to the beginning of the new stack
frame. If the lexical level is zero, <span class="smallcode">enter</span> pushes BP (or EBP), copies SP to
BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting
levels greater than zero, the processor pushes additional frame pointers on
the stack before adjusting the stack pointer.
</p>
<pre class="smallcode">    enter 2048,0     ; enter and allocate 2048 bytes on stack
</pre>

<p><b>
<a name="2.1.12" class="smalltext">2.1.12  System instructions</a>
</b></p>

<p class="smalltext">
<span class="smallcode">lmsw</span> loads the operand into the machine status word (bits 0 through 15 of
CR0 register), while <span class="smallcode">smsw</span> stores the machine status word into the
destination operand. The operand for both those instructions can be 16-bit
general register or memory, for <span class="smallcode">smsw</span> it can also be 32-bit general
register.
</p>
<pre class="smallcode">    lmsw ax          ; load machine status from register
    smsw [bx]        ; store machine status to memory
</pre>
<p class="smalltext">
<span class="smallcode">lgdt</span> and <span class="smallcode">lidt</span> instructions load the values in operand into the global
descriptor table register or the interrupt descriptor table register
respectively. <span class="smallcode">sgdt</span> and <span class="smallcode">sidt</span> store the contents of the global descriptor
table register or the interrupt descriptor table register in the destination
operand. The operand should be a 6 bytes in memory.
</p>
<pre class="smallcode">    lgdt [ebx]       ; load global descriptor table
</pre>
<p class="smalltext">
<span class="smallcode">lldt</span> loads the operand into the segment selector field of the local
descriptor table register and <span class="smallcode">sldt</span> stores the segment selector from the
local descriptor table register in the operand. <span class="smallcode">ltr</span> loads the operand into
the segment selector field of the task register and <span class="smallcode">str</span> stores the segment
selector from the task register in the operand. Rules for operand are the same
as for the <span class="smallcode">lmsw</span> and <span class="smallcode">smsw</span> instructions.
</p>
<p class="smalltext">
<span class="smallcode">lar</span> loads the access rights from the segment descriptor specified by
the selector in source operand into the destination operand and sets the ZF
flag. The destination operand can be a 16-bit or 32-bit general register.
The source operand should be a 16-bit general register or memory.
</p>
<pre class="smallcode">    lar ax,[bx]      ; load access rights into word
    lar eax,dx       ; load access rights into double word
</pre>
<p class="smalltext">
<span class="smallcode">lsl</span> loads the segment limit from the segment descriptor specified by the
selector in source operand into the destination operand and sets the ZF flag.
Rules for operand are the same as for the <span class="smallcode">lar</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">verr</span> and <span class="smallcode">verw</span> verify whether the code or data segment specified with
the operand is readable or writable from the current privilege level. The
operand should be a word, it can be general register or memory. If the segment
is accessible and readable (for <span class="smallcode">verr</span>) or writable (for <span class="smallcode">verw</span>) the ZF flag
is set, otherwise it's cleared. Rules for operand are the same as for the
<span class="smallcode">lldt</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">arpl</span> compares the RPL (requestor's privilege level) fields of two segment
selectors. The first operand contains one segment selector and the second
operand contains the other. If the RPL field of the destination operand is
less than the RPL field of the source operand, the ZF flag is set and the RPL
field of the destination operand is increased to match that of the source
operand. Otherwise, the ZF flag is cleared and no change is made to the
destination operand. The destination operand can be a word general register
or memory, the source operand must be a general register.
</p>
<pre class="smallcode">    arpl bx,ax       ; adjust RPL of selector in register
    arpl [bx],ax     ; adjust RPL of selector in memory
</pre>
<p class="smalltext">
<span class="smallcode">clts</span> clears the TS (task switched) flag in the CR0 register. This
instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">lock</span> prefix causes the processor's bus-lock signal to be asserted during
execution of the accompanying instruction. In a multiprocessor environment,
the bus-lock signal insures that the processor has exclusive use of any shared
memory while the signal is asserted. The <span class="smallcode">lock</span> prefix can be prepended only
to the following instructions and only to those forms of the instructions
where the destination operand is a memory operand: <span class="smallcode">add</span>, <span class="smallcode">adc</span>, <span class="smallcode">and</span>, <span class="smallcode">btc</span>,
<span class="smallcode">btr</span>, <span class="smallcode">bts</span>, <span class="smallcode">cmpxchg</span>, <span class="smallcode">cmpxchg8b</span>, <span class="smallcode">dec</span>,
<span class="smallcode">inc</span>, <span class="smallcode">neg</span>, <span class="smallcode">not</span>, <span class="smallcode">or</span>, <span class="smallcode">sbb</span>
<span class="smallcode">sub</span>, <span class="smallcode">xor</span>, <span class="smallcode">xadd</span> and <span class="smallcode">xchg</span>.
If the <span class="smallcode">lock</span> prefix is used with one of
these instructions and the source operand is a memory operand, an undefined
opcode exception may be generated. An undefined opcode exception will also be
generated if the <span class="smallcode">lock</span> prefix is used with any instruction not in the above
list. The <span class="smallcode">xchg</span> instruction always asserts the bus-lock signal regardless of
the presence or absence of the <span class="smallcode">lock</span> prefix.
</p>
<p class="smalltext">
<span class="smallcode">hlt</span> stops instruction execution and places the processor in a halted
state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
signal will resume execution. This instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">invlpg</span> invalidates (flushes) the TLB (translation lookaside buffer) entry
specified with the operand, which should be a memory. The processor determines
the page that contains that address and flushes the TLB entry for that page.
</p>
<p class="smalltext">
<span class="smallcode">rdmsr</span> loads the contents of a 64-bit MSR (model specific register) of the
address specified in the ECX register into registers EDX and EAX. <span class="smallcode">wrmsr</span>
writes the contents of registers EDX and EAX into the 64-bit MSR of the
address specified in the ECX register. <span class="smallcode">rdtsc</span> loads the current value of the
processor's time stamp counter from the 64-bit MSR into the EDX and EAX
registers. The processor increments the time stamp counter MSR every clock
cycle and resets it to 0 whenever the processor is reset. <span class="smallcode">rdpmc</span> loads the
contents of the 40-bit performance monitoring counter specified in the ECX
register into registers EDX and EAX. These instructions have no operands.
</p>
<p class="smalltext">
<span class="smallcode">wbinvd</span> writes back all modified cache lines in the processor's internal
cache to main memory and invalidates (flushes) the internal caches. The
instruction then issues a special function bus cycle that directs external
caches to also write back modified data and another bus cycle to indicate that
the external caches should be invalidated. This instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">rsm</span> return program control from the system management mode to the program
that was interrupted when the processor received an SMM interrupt. This
instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">sysenter</span> executes a fast call to a level 0 system procedure, <span class="smallcode">sysexit</span>
executes a fast return to level 3 user code. The addresses used by these
instructions are stored in MSRs. These instructions have no operands.
</p>

<p><b>
<a name="2.1.13" class="smalltext">2.1.13  FPU instructions</a>
</b></p>

<p class="smalltext">
The FPU (Floating-Point Unit) instructions operate on the floating-point
values in three formats: single precision (32-bit), double precision (64-bit)
and double extended precision (80-bit). The FPU registers form the stack and
each of them holds the double extended precision floating-point value. When
some values are pushed onto the stack or are removed from the top, the FPU
registers are shifted, so ST0 is always the value on the top of FPU stack, ST1
is the first value below the top, etc. The ST0 name has also the synonym ST.
</p>
<p class="smalltext">
<span class="smallcode">fld</span> pushes the floating-point value onto the FPU register stack. The
operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register,
it's value is then loaded onto the top of FPU register stack (the ST0
register) and is automatically converted into the double extended precision
format.
</p>
<pre class="smallcode">    fld dword [bx]   ; load single prevision value from memory
    fld st2          ; push value of st2 onto register stack
</pre>
<p class="smalltext">
<span class="smallcode">fld1</span>, <span class="smallcode">fldz</span>, <span class="smallcode">fldl2t</span>,
<span class="smallcode">fldl2e</span>, <span class="smallcode">fldpi</span>, <span class="smallcode">fldlg2</span>
and <span class="smallcode">fldln2</span> load the commonly used contants onto the FPU register stack.
The loaded constants are +1.0, +0.0, log<sub>2</sub>10, log<sub>2</sub>e, π, log<sub>10</sub>2 and ln 2 respectively. These instructions have no operands.
</p>
<p class="smalltext">
<span class="smallcode">fild</span> convert the singed integer source operand into double extended
precision floating-point format and pushes the result onto the FPU register
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
</p>
<pre class="smallcode">    fild qword [bx]  ; load 64-bit integer from memory
</pre>
<p class="smalltext">
<span class="smallcode">fst</span> copies the value of ST0 register to the destination operand, which
can be 32-bit or 64-bit memory location or another FPU register. <span class="smallcode">fstp</span>
performs the same operation as <span class="smallcode">fst</span> and then pops the register stack,
getting rid of ST0. <span class="smallcode">fstp</span> accepts the same operands as the <span class="smallcode">fst</span> instruction
and can also store value in the 80-bit memory.
</p>
<pre class="smallcode">    fst st3          ; copy value of st0 into st3 register
    fstp tword [bx]  ; store value in memory and pop stack
</pre>
<p class="smalltext">
<span class="smallcode">fist</span> converts the value in ST0 to a signed integer and stores the result
in the destination operand. The operand can be 16-bit or 32-bit memory
location. <span class="smallcode">fistp</span> performs the same operation and then pops the register
stack, it accepts the same operands as the <span class="smallcode">fist</span> instruction and can also
store integer value in the 64-bit memory, so it has the same rules for
operands as <span class="smallcode">fild</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">fbld</span> converts the packed BCD integer into double extended precision
floating-point format and pushes this value onto the FPU stack. <span class="smallcode">fbstp</span>
converts the value in ST0 to an 18-digit packed BCD integer, stores the result
in the destination operand, and pops the register stack. The operand should be
an 80-bit memory location.
</p>
<p class="smalltext">
<span class="smallcode">fadd</span> adds the destination and source operand and stores the sum in the
destination location. The destination operand is always an FPU register, if
the source is a memory location, the destination is ST0 register and only
source operand should be specified. If both operands are FPU registers, at
least one of them should be ST0 register. An operand in memory can be a
32-bit or 64-bit value.
</p>
<pre class="smallcode">    fadd qword [bx]  ; add double precision value to st0
    fadd st2,st0     ; add st0 to st2
</pre>
<p class="smalltext">
<span class="smallcode">faddp</span> adds the destination and source operand, stores the sum in the
destination location and then pops the register stack. The destination operand
must be an FPU register and the source operand must be the ST0. When no
operands are specified, ST1 is used as a destination operand.
</p>
<pre class="smallcode">    faddp            ; add st0 to st1 and pop the stack
    faddp st2,st0    ; add st0 to st2 and pop the stack
</pre>
<p class="smalltext">
<span class="smallcode">fiadd</span> instruction converts an integer source operand into double extended
precision floating-point value and adds it to the destination operand. The
operand should be a 16-bit or 32-bit memory location.
</p>
<pre class="smallcode">    fiadd word [bx]  ; add word integer to st0
</pre>
<p class="smalltext">
<span class="smallcode">fsub</span>, <span class="smallcode">fsubr</span>, <span class="smallcode">fmul</span>, <span class="smallcode">fdiv</span>, <span class="smallcode">fdivr</span> instruction are similar to <span class="smallcode">fadd</span>,
have the same rules for operands and differ only in the perfomed computation.
<span class="smallcode">fsub</span> substracts the source operand from the destination operand, <span class="smallcode">fsubr</span>
substract the destination operand from the source operand, <span class="smallcode">fmul</span> multiplies
the destination and source operands, <span class="smallcode">fdiv</span> divides the destination operand by
the source operand and <span class="smallcode">fdivr</span> divides the source operand by the destination
operand. <span class="smallcode">fsubp</span>, <span class="smallcode">fsubrp</span>, <span class="smallcode">fmulp</span>, <span class="smallcode">fdivp</span>, <span class="smallcode">fdivrp</span> perform the same
operations and pop the register stack, the rules for operand are the same as
for the <span class="smallcode">faddp</span> instruction. <span class="smallcode">fisub</span>, <span class="smallcode">fisubr</span>, <span class="smallcode">fimul</span>, <span class="smallcode">fidiv</span>, <span class="smallcode">fidivr</span>
perform these operations after converting the integer source operand into
floating-point value, they have the same rules for operands as <span class="smallcode">fiadd</span>
instruction.
</p>
<p class="smalltext">
<span class="smallcode">fsqrt</span> computes the square root of the value in ST0 register, <span class="smallcode">fsin</span>
computes the sine of that value, <span class="smallcode">fcos</span> computes the cosine of that value,
<span class="smallcode">fchs</span> complements its sign bit, <span class="smallcode">fabs</span> clears its sign to create the absolute
value, <span class="smallcode">frndint</span> rounds it to the nearest integral value, depending on the
current rounding mode. <span class="smallcode">f2xm1</span> computes the exponential value of 2 to the
power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the
range -1.0 to +1.0. All these instruction store the result in ST0 and have no
operands.
</p>
<p class="smalltext">
<span class="smallcode">fsincos</span> computes both the sine and the cosine of the value in ST0
register, stores the sine in ST0 and pushes the cosine on the top of FPU
register stack. <span class="smallcode">fptan</span> computes the tangent of the value in ST0, stores the
result in ST0 and pushes a 1.0 onto the FPU register stack. <span class="smallcode">fpatan</span> computes
the arctangent of the value in ST1 divided by the value in ST0, stores the
result in ST1 and pops the FPU register stack. <span class="smallcode">fyl2x</span> computes the binary
logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pop the
FPU register stack; <span class="smallcode">fyl2xp1</span> performs the same operation but it adds 1.0 to
ST0 before computing the logarithm. <span class="smallcode">fprem</span> computes the remainder obtained
from dividing the value in ST0 by the value in ST1, and stores the result
in ST0. <span class="smallcode">fprem1</span> performs the same operation as <span class="smallcode">fprem</span>, but it computes the
remainder in the way specified by IEEE Standard 754. <span class="smallcode">fscale</span> truncates the
value in ST1 and increases the exponent of ST0 by this value. <span class="smallcode">fxtract</span>
separates the value in ST0 into its exponent and significand, stores the
exponent in ST0 and pushes the significand onto the register stack. <span class="smallcode">fnop</span>
performs no operation. These instruction have no operands.
</p>
<p class="smalltext">
<span class="smallcode">fxch</span> exchanges the contents of ST0 an another FPU register. The operand
should be an FPU register, if no operand is specified, the contents of ST0 and
ST1 are exchanged.
</p>
<p class="smalltext">
<span class="smallcode">fcom</span> and <span class="smallcode">fcomp</span> compare the contents of ST0 and the source operand and
set flags in the FPU status word according to the results. <span class="smallcode">fcomp</span>
additionally pops the register stack after performing the comparision. The
operand can be a single or double precision value in memory or the FPU
register. When no operand is specified, ST1 is used as a source operand.
</p>
<pre class="smallcode">    fcom             ; compare st0 with st1
    fcomp st2        ; compare st0 with st2 and pop stack
</pre>
<p class="smalltext">
<span class="smallcode">fcompp</span> compares the contents of ST0 and ST1, sets flags in the FPU status
word according to the results and pops the register stack twice. This
instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">fucom</span>, <span class="smallcode">fucomp</span> and <span class="smallcode">fucompp</span> performs an unordered comparision of two FPU
registers. Rules for operands are the same as for the <span class="smallcode">fcom</span>, <span class="smallcode">fcomp</span> and
<span class="smallcode">fcompp</span>, but the source operand must be an FPU register.
</p>
<p class="smalltext">
<span class="smallcode">ficom</span> and <span class="smallcode">ficomp</span> compare the value in ST0 with an integer source operand
and set the flags in the FPU status word according to the results. <span class="smallcode">ficomp</span>
additionally pops the register stack after performing the comparision. The
integer value is converted to double extended precision floating-point format
before the comparision is made. The operand should be a 16-bit or 32-bit
memory location.
</p>
<pre class="smallcode">    ficom word [bx]  ; compare st0 with 16-bit integer
</pre>
<p class="smalltext">
<span class="smallcode">fcomi</span>, <span class="smallcode">fcomip</span>, <span class="smallcode">fucomi</span>, <span class="smallcode">fucomip</span> perform the comparision of ST0 with
another FPU register and set the ZF, PF and CF flags according to the results.
<span class="smallcode">fcomip</span> and <span class="smallcode">fucomip</span> additionaly pop the register stack after performing the
comparision. The instructions obtained by attaching the FPU condition mnemonic
(see table <a href="#_2.2">2.2</a>) to the <span class="smallcode">fcmov</span> mnemonic transfer the specified FPU register
into ST0 register if the fiven test condition is true. These instruction
allow two different syntaxes, one with single operand specifying the source
FPU register, and one with two operands, in that case destination operand
should be ST0 register and the second operand specifies the source FPU
register.
</p>
<pre class="smallcode">    fcomi st2        ; compare st0 with st2 and set flags
    fcmovb st0,st2   ; transfer st2 to st0 if below
</pre>
<p class="smalltext">
<b><a name="_2.2">Table 2.2  FPU conditions</a></b>
</p>
<table class="doctable" style="width: 350px;">
  <tr>
    <th>Mnemonic</th>
    <th>Condition tested</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><span class="smallcode">b</span></td>
    <td>CF = 1</td>
    <td>below</td>
  </tr>
  <tr>
    <td><span class="smallcode">e</span></td>
    <td>ZF = 1</td>
    <td>equal</td>
  </tr>
  <tr>
    <td><span class="smallcode">be</span></td>
    <td>CF <span class="smallcode">or</span> ZF = 1</td>
    <td>equal</td>
  </tr>
  <tr>
    <td><span class="smallcode">u</span></td>
    <td>PF = 1</td>
    <td>unordered</td>
  </tr>
  <tr>
    <td><span class="smallcode">nb</span></td>
    <td>CF = 0</td>
    <td>not below</td>
  </tr>
  <tr>
    <td><span class="smallcode">ne</span></td>
    <td>ZF = 0</td>
    <td>not equal</td>
  </tr>
  <tr>
    <td><span class="smallcode">nbe</span></td>
    <td>CF <span class="smallcode">or</span> ZF = 0</td>
    <td>not equal</td>
  </tr>
  <tr>
    <td><span class="smallcode">nu</span></td>
    <td>PF = 0</td>
    <td>not unordered</td>
  </tr>
</table>
<p class="smalltext">
<span class="smallcode">ftst</span> compares the value in ST0 with 0.0 and sets the flags in the FPU
status word according to the results. <span class="smallcode">fxam</span> examines the contents of the ST0
and sets the flags in FPU status word to indicate the class of value in the
register. These instructions have no operands.
</p>
<p class="smalltext">
<span class="smallcode">fstsw</span> and <span class="smallcode">fnstsw</span> store the current value of the FPU status word in the
destination location. The destination operand can be either a 16-bit memory or
the AX register. <span class="smallcode">fstsw</span> checks for pending umasked FPU exceptions before
storing the status word, <span class="smallcode">fnstsw</span> does not.
</p>
<p class="smalltext">
<span class="smallcode">fstcw</span> and <span class="smallcode">fnstcw</span> store the current value of the FPU control word at the
specified destination in memory. <span class="smallcode">fstcw</span> checks for pending umasked FPU
exceptions before storing the control word, <span class="smallcode">fnstcw</span> does not. <span class="smallcode">fldcw</span> loads
the operand into the FPU control word. The operand should be a 16-bit memory
location.
</p>
<p class="smalltext">
<span class="smallcode">fstenv</span> and <span class="smallcode">fnstenv</span> store the current FPU operating environment at the
memory location specified with the destination operand, and then mask all FPU
exceptions. <span class="smallcode">fstenv</span> checks for pending umasked FPU exceptions before
proceeding, <span class="smallcode">fnstenv</span> does not. <span class="smallcode">fldenv</span> loads the complete operating
environment from memory into the FPU. <span class="smallcode">fsave</span> and <span class="smallcode">fnsave</span> store the current
FPU state (operating environment and register stack) at the specified
destination in memory and reinitializes the FPU. <span class="smallcode">fsave</span> check for pending
unmasked FPU exceptions before proceeding, <span class="smallcode">fnsave</span> does not. <span class="smallcode">frstor</span>
loads the FPU state from the specified memory location. All these instructions
need an operand being a memory location.
For each of these instruction
exist two additional mnemonics that allow to precisely select the type of the
operation. The <span class="smallcode">fstenvw</span>, <span class="smallcode">fnstenvw</span>, <span class="smallcode">fldenvw</span>, <span class="smallcode">fsavew</span>, <span class="smallcode">fnsavew</span> and
<span class="smallcode">frstorw</span> mnemonics force the instruction to perform operation as in the 16-bit
mode, while <span class="smallcode">fstenvd</span>, <span class="smallcode">fnstenvd</span>, <span class="smallcode">fldenvd</span>, <span class="smallcode">fsaved</span>, <span class="smallcode">fnsaved</span> and <span class="smallcode">frstord</span>
force the operation as in 32-bit mode.
</p>
<p class="smalltext">
<span class="smallcode">finit</span> and <span class="smallcode">fninit</span> set the FPU operating environment into its default
state. <span class="smallcode">finit</span> checks for pending unmasked FPU exception before proceeding,
<span class="smallcode">fninit</span> does not. <span class="smallcode">fclex</span> and <span class="smallcode">fnclex</span> clear the FPU exception flags in the
FPU status word. <span class="smallcode">fclex</span> checks for pending unmasked FPU exception before
proceeding, <span class="smallcode">fnclex</span> does not. <span class="smallcode">wait</span> and <span class="smallcode">fwait</span> are synonyms for the same
instruction, which causes the processor to check for pending unmasked FPU
exceptions and handle them before proceeding. These instruction have no
operands.
</p>
<p class="smalltext">
<span class="smallcode">ffree</span> sets the tag associated with specified FPU register to empty. The
operand should be an FPU register.
</p>
<p class="smalltext">
<span class="smallcode">fincstp</span> and <span class="smallcode">fdecstp</span> rotate the FPU stack by one by adding or
substracting one to the pointer of the top of stack. These instruction have no
operands.
</p>

<p><b>
<a name="2.1.14" class="smalltext">2.1.14  MMX instructions</a>
</b></p>

<p class="smalltext">
The MMX instructions operate on the packed integer types and use the MMX
registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
of this MMX instructions cannot be used at the same time as FPU instructions.
They can operate on packed bytes (eight 8-bit integers), packed words (four
16-bit integers) or packed double words (two 32-bit integers), use of packed
formats allows to perform operations on multiple data at one time.
</p>
<p class="smalltext">
<span class="smallcode">movq</span> copies a quad word from the source operand to the destination
operand. At least one of the operands must be a MMX register, the second one
can be also a MMX register or 64-bit memory location.
</p>
<pre class="smallcode">    movq mm0,mm1     ; move quad word from register to register
    movq mm2,[ebx]   ; move quad word from memory to register
</pre>
<p class="smalltext">
<span class="smallcode">movd</span> copies a double word from the source operand to the destination
operand. One of the operands must be a MMX register, the second one can be a
general register or 32-bit memory location. Only low double word of MMX
register is used.
</p>
<p class="smalltext">
All general MMX operations have two operands, the destination operand should
be a MMX register, the source operand can be a MMX register or 64-bit memory
location. Operation is performed on the corresponding data elements of the
source and destination operand and stored in the data elements of the
destination operand. <span class="smallcode">paddb</span>, <span class="smallcode">paddw</span> and <span class="smallcode">paddd</span> perform the addition of
packed bytes, packed words, or packed double words.  <span class="smallcode">psubb</span>, <span class="smallcode">psubw</span> and
<span class="smallcode">psubd</span> perform the substraction of appropriate types. <span class="smallcode">paddsb</span>, <span class="smallcode">paddsw</span>,
<span class="smallcode">psubsb</span> and <span class="smallcode">psubsw</span> perform the addition or substraction of packed bytes
or packed words with the signed saturation. <span class="smallcode">paddusb</span>, <span class="smallcode">paddusw</span>, <span class="smallcode">psubusb</span>,
<span class="smallcode">psubusw</span> are analoguous, but with unsigned saturation. <span class="smallcode">pmulhw</span> and <span class="smallcode">pmullw</span>
performs a signed multiplication of the packed words and store the high or low words
of the results in the destination operand. <span class="smallcode">pmaddwd</span> performs a multiply of
the packed words and adds the four intermediate double word products in pairs
to produce result as a packed double words. <span class="smallcode">pand</span>, <span class="smallcode">por</span> and <span class="smallcode">pxor</span> perform
the logical operations on the quad words, <span class="smallcode">pandn</span> peforms also a logical
negation of the destination operand before performing the <span class="smallcode">and</span> operation.
<span class="smallcode">pcmpeqb</span>, <span class="smallcode">pcmpeqw</span> and <span class="smallcode">pcmpeqd</span> compare for equality of packed bytes,
packed words or packed double words. If a pair of data elements is equal, the
corresponding data element in the destination operand is filled with bits of
value 1, otherwise it's set to 0. <span class="smallcode">pcmpgtb</span>, <span class="smallcode">pcmpgtw</span> and <span class="smallcode">pcmpgtd</span> perform
the similar operation, but they check whether the data elements in the
destination operand are greater than the correspoding data elements in the
source operand. <span class="smallcode">packsswb</span> converts packed signed words into packed signed
bytes, <span class="smallcode">packssdw</span> converts packed signed double words into packed signed
words, using saturation to handle overflow conditions. <span class="smallcode">packuswb</span> converts
packed signed words into packed unsigned bytes. Converted data elements from
the source operand are stored in the low part of the destination operand,
while converted data elements from the destination operand are stored in the
high part. <span class="smallcode">punpckhbw</span>, <span class="smallcode">punpckhwd</span> and <span class="smallcode">punpckhdq</span> interleaves the data
elements from the high parts of the source and destination operands and
stores the result into the destination operand. <span class="smallcode">punpcklbw</span>, <span class="smallcode">punpcklwd</span> and
<span class="smallcode">punpckldq</span> perform the same operation, but the low parts of the source and
destination operand are used.
</p>
<pre class="smallcode">    paddsb mm0,[esi] ; add packed bytes with signed saturation
    pcmpeqw mm3,mm7  ; compare packed words for equality
</pre>
<p class="smalltext">
<span class="smallcode">psllw</span>, <span class="smallcode">pslld</span> and <span class="smallcode">psllq</span> perform logical shift left of the packed words,
packed double words or a single quad word in the destination operand by the
amount specified in the source operand. <span class="smallcode">psrlw</span>, <span class="smallcode">psrld</span> and <span class="smallcode">psrlq</span> perform
logical shift right of the packed words, packed double words or a single quad
word. <span class="smallcode">psraw</span> and <span class="smallcode">psrad</span> perform arithmetic shift of the packed words or
double words. The destination operand should be a MMX register, while source
operand can be a MMX register, 64-bit memory location, or 8-bit immediate
value.
</p>
<pre class="smallcode">    psllw mm2,mm4    ; shift words left logically
    psrad mm4,[ebx]  ; shift double words right arithmetically
</pre>
<p class="smalltext">
<span class="smallcode">emms</span> makes the FPU registers usable for the FPU instructions, it must be
used before using the FPU instructions if any MMX instructions were used.
</p>

<p><b>
<a name="2.1.15" class="smalltext">2.1.15  SSE instructions</a>
</b></p>

<p class="smalltext">
The SSE extension adds more MMX instructions and also introduces the
operations on packed single precision floating point values. The 128-bit
packed single precision format consists of four single precision floating
point values. The 128-bit SSE registers are designed for the purpose of
operations on this data type.
</p>
<p class="smalltext">
<span class="smallcode">movaps</span> and <span class="smallcode">movups</span> transfer a double quad word operand containing packed
single precision values from source operand to destination operand. At least
one of the operands have to be a SSE register, the second one can be also a
SSE register or 128-bit memory location. Memory operands for <span class="smallcode">movaps</span>
instruction must be aligned on boundary of 16 bytes, operands for <span class="smallcode">movups</span>
instruction don't have to be aligned.
</p>
<pre class="smallcode">    movups xmm0,[ebx]  ; move unaligned double quad word
</pre>
<p class="smalltext">
<span class="smallcode">movlps</span> moves packed two single precision values between the memory and the
low quad word of SSE register. <span class="smallcode">movhps</span> moved packed two single precision
values between the memory and the high quad word of SSE register. One of the
operands must be a SSE register, and the other operand must be a 64-bit memory
location.
</p>
<pre class="smallcode">    movlps xmm0,[ebx]  ; move memory to low quad word of xmm0
    movhps [esi],xmm7  ; move high quad word of xmm7 to memory
</pre>
<p class="smalltext">
<span class="smallcode">movlhps</span> moves packed two single precision values from the low quad word
of source register to the high quad word of destination register. <span class="smallcode">movhlps</span>
moves two packed single precision values from the high quad word of source
register to the low quad word of destination register. Both operands have to
be a SSE registers.
</p>
<p class="smalltext">
<span class="smallcode">movmskps</span> transfers the most significant bit of each of the four single
precision values in the SSE register into low four bits of a general register.
The source operand must be a SSE register, the destination operand must be a
general register.
</p>
<p class="smalltext">
<span class="smallcode">movss</span> transfers a single precision value between source and destination
operand (only the low double word is trasferred). At least one of the operands
have to be a SSE register, the second one can be also a SSE register or 32-bit
memory location.
</p>
<pre class="smallcode">    movss [edi],xmm3   ; move low double word of xmm3 to memory
</pre>
<p class="smalltext">
Each of the SSE arithmetic operations has two variants. When the mnemonic
ends with <span class="smallcode">ps</span>, the source operand can be a 128-bit memory location or a SSE
register, the destination operand must be a SSE register and the operation is
performed on packed four single precision values, for each pair of the
corresponding data elements separately, the result is stored in the
destination register. When the mnemonic ends with <span class="smallcode">ss</span>, the source operand
can be a 32-bit memory location or a SSE register, the destination operand
must be a SSE register and the operation is performed on single precision
values, only low double words of SSE registers are used in this case, the
result is stored in the low double word of destination register. <span class="smallcode">addps</span> and
<span class="smallcode">addss</span> add the values, <span class="smallcode">subps</span> and <span class="smallcode">subss</span> substract the source value from
destination value, <span class="smallcode">mulps</span> and <span class="smallcode">mulss</span> multiply the values, <span class="smallcode">divps</span> and
<span class="smallcode">divss</span> divide the destination value by the source value, <span class="smallcode">rcpps</span> and <span class="smallcode">rcpss</span>
compute the approximate reciprocal of the source value, <span class="smallcode">sqrtps</span> and <span class="smallcode">sqrtss</span>
compute the square root of the source value, <span class="smallcode">rsqrtps</span> and <span class="smallcode">rsqrtss</span> compute
the approximate reciprocal of square root of the source value, <span class="smallcode">maxps</span> and
<span class="smallcode">maxss</span> compare the source and destination values and return the greater one,
<span class="smallcode">minps</span> and <span class="smallcode">minss</span> compare the source and destination values and return the
lesser one.
</p>
<pre class="smallcode">    mulss xmm0,[ebx]   ; multiply single precision values
    addps xmm3,xmm7    ; add packed single precision values
</pre>
<p class="smalltext">
<span class="smallcode">andps</span>, <span class="smallcode">andnps</span>, <span class="smallcode">orps</span> and <span class="smallcode">xorps</span> perform the logical operations on
packed single precision values. The source operand can be a 128-bit memory
location or a SSE register, the destination operand must be a SSE register.
</p>
<p class="smalltext">
<span class="smallcode">cmpps</span> compares packed single precision values and returns a mask result
into the destination operand, which must be a SSE register. The source operand
can be a 128-bit memory location or SSE register, the third operand must be an
immediate operand selecting code of one of the eight compare conditions
(table <a href="#_2.3">2.3</a>). <span class="smallcode">cmpss</span> performs the same operation on single precision values,
only low double word of destination register is affected, in this case source
operand can be a 32-bit memory location or SSE register. These two
instructions have also variants with only two operands and the condition
encoded within mnemonic. Their mnemonics are obtained by attaching the
mnemonic from table <a href="#_2.3">2.3</a> to the <span class="smallcode">cmp</span> mnemonic and then attaching the <span class="smallcode">ps</span> or
<span class="smallcode">ss</span> at the end.
</p>
<pre class="smallcode">    cmpps xmm2,xmm4,0  ; compare packed single precision values
    cmpltss xmm0,[ebx] ; compare single precision values
</pre>
<p class="smalltext">
<b><a name="_2.3">Table 2.3  SSE conditions</a></b>
</p>
<table class="doctable" style="width: 350px;">
  <tr>
    <th>Code</th>
    <th>Mnemonic</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>0</td>
    <td><span class="smallcode">eq</span></td>
    <td>equal</td>
  </tr>
  <tr>
    <td>1</td>
    <td><span class="smallcode">lt</span></td>
    <td>less than</td>
  </tr>
  <tr>
    <td>2</td>
    <td><span class="smallcode">le</span></td>
    <td>less than or equal</td>
  </tr>
  <tr>
    <td>3</td>
    <td><span class="smallcode">unord</span></td>
    <td>unordered</td>
  </tr>
  <tr>
    <td>4</td>
    <td><span class="smallcode">neq</span></td>
    <td>not equal</td>
  </tr>
  <tr>
    <td>5</td>
    <td><span class="smallcode">nlt</span></td>
    <td>not less than</td>
  </tr>
  <tr>
    <td>6</td>
    <td><span class="smallcode">nle</span></td>
    <td>not less than nor equal</td>
  </tr>
  <tr>
    <td>7</td>
    <td><span class="smallcode">ord</span></td>
    <td>ordered</td>
  </tr>
</table>
<p class="smalltext">
<span class="smallcode">comiss</span> and <span class="smallcode">ucomiss</span> compare the single precision values and set the ZF,
PF and CF flags to show the result. The destination operand must be a SSE
register, the source operand can be a 32-bit memory location or SSE register.
</p>
<p class="smalltext">
<span class="smallcode">shufps</span> moves any two of the four single precision values from the
destination operand into the low quad word of the destination operand, and any
two of the four values from the source operand into the high quad word of the
destination operand. The destination operand must be a SSE register, the
source operand can be a 128-bit memory location or SSE register, the third
operand must be an 8-bit immediate value selecting which values will be moved
into the destination operand. Bits 0 and 1 select the value to be moved from
destination operand to the low double word of the result, bits 2 and 3 select
the value to be moved from the destination operand to the second double word,
bits 4 and 5 select the value to be moved from the source operand to the third
double word, and bits 6 and 7 select the value to be moved from the source
operand to the high double word of the result.
</p>
<pre class="smallcode">    shufps xmm0,xmm0,10010011b ; shuffle double words
</pre>
<p class="smalltext">
<span class="smallcode">unpckhps</span> performs an interleaved unpack of the values from the high parts
of the source and destination operands and stores the result in the
destination operand, which must be a SSE register. The source operand can be
a 128-bit memory location or a SSE register. <span class="smallcode">unpcklps</span> performs an
interleaved unpack of the values from the low parts of the source and
destination operand and stores the result in the destination operand,
the rules for operands are the same.
</p>
<p class="smalltext">
<span class="smallcode">cvtpi2ps</span> converts packed two double word integers into the the packed two
single precision floating point values and stores the result in the low quad
word of the destination operand, which should be a SSE register. The source
operand can be a 64-bit memory location or MMX register.
</p>
<pre class="smallcode">    cvtpi2ps xmm0,mm0  ; convert integers to single precision values
</pre>
<p class="smalltext">
<span class="smallcode">cvtsi2ss</span> converts a double word integer into a single precision floating
point value and stores the result in the low double word of the destination
operand, which should be a SSE register. The source operand can be a 32-bit
memory location or 32-bit general register.
</p>
<pre class="smallcode">    cvtsi2ss xmm0,eax  ; convert integer to single precision value
</pre>
<p class="smalltext">
<span class="smallcode">cvtps2pi</span> converts packed two single precision floating point values into
packed two double word integers and stores the result in the destination
operand, which should be a MMX register. The source operand can be a 64-bit
memory location or SSE register, only low quad word of SSE register is used.
<span class="smallcode">cvttps2pi</span> performs the similar operation, except that truncation is used to
round a source values to integers, rules for the operands are the same.
</p>
<pre class="smallcode">    cvtps2pi mm0,xmm0  ; convert single precision values to integers
</pre>
<p class="smalltext">
<span class="smallcode">cvtss2si</span> convert a single precision floating point value into a double
word integer and stores the result in the destination operand, which should be
a 32-bit general register. The source operand can be a 32-bit memory location
or SSE register, only low double word of SSE register is used. <span class="smallcode">cvttss2si</span>
performs the similar operation, except that truncation is used to round a
source value to integer, rules for the operands are the same.
</p>
<pre class="smallcode">    cvtss2si eax,xmm0  ; convert single precision value to integer
</pre>
<p class="smalltext">
<span class="smallcode">pextrw</span> copies the word in the source operand specified by the third
operand to the destination operand. The source operand must be a MMX register,
the destination operand must be a 32-bit general register (the high word of
the destination is cleared), the third operand must an 8-bit immediate value.
</p>
<pre class="smallcode">    pextrw eax,mm0,1   ; extract word into eax
</pre>
<p class="smalltext">
<span class="smallcode">pinsrw</span> inserts a word from the source operand in the destination operand
at the location specified with the third operand, which must be an 8-bit
immediate value. The destination operand must be a MMX register, the source
operand can be a 16-bit memory location or 32-bit general register (only low
word of the register is used).
</p>
<pre class="smallcode">    pinsrw mm1,ebx,2   ; insert word from ebx
</pre>
<p class="smalltext">
<span class="smallcode">pavgb</span> and <span class="smallcode">pavgw</span> compute average of packed bytes or words. <span class="smallcode">pmaxub</span>
return the maximum values of packed unsigned bytes, <span class="smallcode">pminub</span> returns the
minimum values of packed unsigned bytes, <span class="smallcode">pmaxsw</span> returns the maximum values
of packed signed words, <span class="smallcode">pminsw</span> returns the minimum values of packed signed
words. <span class="smallcode">pmulhuw</span> performs a unsigned multiplication of the packed words and stores
the high words of the results in the destination operand. <span class="smallcode">psadbw</span> computes
the absolute differences of packed unsigned bytes, sums the differences, and
stores the sum in the low word of destination operand. All these instructions
follow the same rules for operands as the general MMX operations described in
previous section.
</p>
<p class="smalltext">
<span class="smallcode">pmovmskb</span> creates a mask made of the most significant bit of each byte in
the source operand and stores the result in the low byte of destination
operand. The source operand must be a MMX register, the destination operand
must a 32-bit general register.
</p>
<p class="smalltext">
<span class="smallcode">pshufw</span> inserts words from the source operand in the destination operand
from the locations specified with the third operand. The destination operand
must be a MMX register, the source operand can be a 64-bit memory location or
MMX register, third operand must an 8-bit immediate value selecting which
values will be moved into destination operand, in the similar way as the third
operand of the <span class="smallcode">shufps</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">movntq</span> moves the quad word from the source operand to memory using a
non-temporal hint to minimize cache pollution. The source operand should be a
MMX register, the destination operand should be a 64-bit memory location.
<span class="smallcode">movntps</span> stores packed single precision values from the SSE register to
memory using a non-temporal hint. The source operand should be a SSE register,
the destination operand should be a 128-bit memory location. <span class="smallcode">maskmovq</span> stores
selected bytes from the first operand into a 64-bit memory location using a
non-temporal hint. Both operands should be a MMX registers, the second operand
selects wich bytes from the source operand are written to memory. The
memory location is pointed by DI (or EDI) register in the segment selected
by DS.
</p>
<p class="smalltext">
<span class="smallcode">prefetcht0</span>, <span class="smallcode">prefetcht1</span>, <span class="smallcode">prefetcht2</span> and <span class="smallcode">prefetchnta</span> fetch the line
of data from memory that contains byte specified with the operand to a
specified location in hierarchy.  The operand should be an 8-bit memory
location.
</p>
<p class="smalltext">
<span class="smallcode">sfence</span> performs a serializing operation on all instruction storing to
memory that were issued prior to it. This instruction has no operands.
</p>
<p class="smalltext">
<span class="smallcode">ldmxcsr</span> loads the 32-bit memory operand into the MXCSR register. <span class="smallcode">stmxcsr</span>
stores the contents of MXCSR into a 32-bit memory operand.
</p>
<p class="smalltext">
<span class="smallcode">fxsave</span> saves the current state of the FPU, MXCSR register, and all the FPU
and SSE registers to a 512-byte memory location specified in the destination
operand. <span class="smallcode">fxrstor</span> reloads data previously stored with <span class="smallcode">fxsave</span> instruction
from the specified 512-byte memory location. The memory operand for both those
instructions must be aligned on 16 byte boundary, it should declare operand
of no specified size.
</p>

<p><b>
<a name="2.1.16" class="smalltext">2.1.16  SSE2 instructions</a>
</b></p>

<p class="smalltext">
The SSE2 extension introduces the operations on packed double precision
floating point values, extends the syntax of MMX instructions, and adds also
some new instructions.
</p>
<p class="smalltext">
<span class="smallcode">movapd</span> and <span class="smallcode">movupd</span> transfer a double quad word operand containing packed
double precision values from source operand to destination operand. These
instructions are analogous to <span class="smallcode">movaps</span> and <span class="smallcode">movups</span> and have the same rules
for operands.
</p>
<p class="smalltext">
<span class="smallcode">movlpd</span> moves double precision value between the memory and the low quad
word of SSE register. <span class="smallcode">movhpd</span> moved double precision value between the memory
and the high quad word of SSE register. These instructions are analogous to
<span class="smallcode">movlps</span> and <span class="smallcode">movhps</span> and have the same rules for operands.
</p>
<p class="smalltext">
<span class="smallcode">movmskpd</span> transfers the most significant bit of each of the two double
precision values in the SSE register into low two bits of a general register.
This instruction is analogous to <span class="smallcode">movmskps</span> and has the same rules for
operands.
</p>
<p class="smalltext">
<span class="smallcode">movsd</span> transfers a double precision value between source and destination
operand (only the low quad word is trasferred). At least one of the operands
have to be a SSE register, the second one can be also a SSE register or 64-bit
memory location.
</p>
<p class="smalltext">
Arithmetic operations on double precision values are: <span class="smallcode">addpd</span>, <span class="smallcode">addsd</span>,
<span class="smallcode">subpd</span>, <span class="smallcode">subsd</span>, <span class="smallcode">mulpd</span>, <span class="smallcode">mulsd</span>, <span class="smallcode">divpd</span>, <span class="smallcode">divsd</span>,
<span class="smallcode">sqrtpd</span>, <span class="smallcode">sqrtsd</span>,
<span class="smallcode">maxpd</span>, <span class="smallcode">maxsd</span>, <span class="smallcode">minpd</span>, <span class="smallcode">minsd</span>, and they are analoguous to arithmetic
operations on single precision values described in previous section. When the
mnemonic ends with <span class="smallcode">pd</span> instead of <span class="smallcode">ps</span>, the operation is performed on packed
two double precision values, but rules for operands are the same. When the
mnemonic ends with <span class="smallcode">sd</span> instead of <span class="smallcode">ss</span>, the source operand can be a 64-bit
memory location or a SSE register, the destination operand must be a SSE
register and the operation is performed on double precision values, only low
quad words of SSE registers are used in this case.
</p>
<p class="smalltext">
<span class="smallcode">andpd</span>, <span class="smallcode">andnpd</span>, <span class="smallcode">orpd</span> and <span class="smallcode">xorpd</span> perform the logical operations on
packed double precision values. They are analoguous to SSE logical operations
on single prevision values and have the same rules for operands.
</p>
<p class="smalltext">
<span class="smallcode">cmppd</span> compares packed double precision values and returns and returns a
mask result into the destination operand. This instruction is analoguous to
<span class="smallcode">cmpps</span> and has the same rules for operands. <span class="smallcode">cmpsd</span> performs the same
operation on double precision values, only low quad word of destination
register is affected, in this case source operand can be a 64-bit memory or
SSE register. Variant with only two operands are obtained by attaching the
condition mnemonic from table <a href="#_2.3">2.3</a> to the <span class="smallcode">cmp</span> mnemonic and then attaching
the <span class="smallcode">pd</span> or <span class="smallcode">sd</span> at the end.
</p>
<p class="smalltext">
<span class="smallcode">comisd</span> and <span class="smallcode">ucomisd</span> compare the double precision values and set the ZF,
PF and CF flags to show the result. The destination operand must be a SSE
register, the source operand can be a 128-bit memory location or SSE register.
</p>
<p class="smalltext">
<span class="smallcode">shufpd</span> moves any of the two double precision values from the destination
operand into the low quad word of the destination operand, and any of the two
values from the source operand into the high quad word of the destination
operand. This instruction is analoguous to <span class="smallcode">shufps</span> and has the same rules for
operand. Bit 0 of the third operand selects the value to be moved from the
destination operand, bit 1 selects the value to be moved from the source
operand, the rest of bits are reserved and must be zeroed.
</p>
<p class="smalltext">
<span class="smallcode">unpckhpd</span> performs an unpack of the high quad words from the source and
destination operands, <span class="smallcode">unpcklpd</span> performs an unpack of the low quad words from
the source and destination operands. They are analoguous to <span class="smallcode">unpckhps</span> and
<span class="smallcode">unpcklps</span>, and have the same rules for operands.
</p>
<p class="smalltext">
<span class="smallcode">cvtps2pd</span> converts the packed two single precision floating point values to
two packed double precision floating point values, the destination operand
must be a SSE register, the source operand can be a 64-bit memory location or
SSE register. <span class="smallcode">cvtpd2ps</span> converts the packed two double precision floating
point values to packed two single precision floating point values, the
destination operand must be a SSE register, the source operand can be a
128-bit memory location or SSE register. <span class="smallcode">cvtss2sd</span> converts the single
precision floating point value to double precision floating point value, the
destination operand must be a SSE register, the source operand can be a 32-bit
memory location or SSE register. <span class="smallcode">cvtsd2ss</span> converts the double precision
floating point value to single precision floating point value, the destination
operand must be a SSE register, the source operand can be 64-bit memory
location or SSE register.
</p>
<p class="smalltext">
<span class="smallcode">cvtpi2pd</span> converts packed two double word integers into the the packed
double precision floating point values, the destination operand must be a SSE
register, the source operand can be a 64-bit memory location or MMX register.
<span class="smallcode">cvtsi2sd</span> converts a double word integer into a double precision floating
point value, the destination operand must be a SSE register, the source
operand can be a 32-bit memory location or 32-bit general register. <span class="smallcode">cvtpd2pi</span>
converts packed double precision floating point values into packed two double
word integers, the destination operand should be a MMX register, the source
operand can be a 128-bit memory location or SSE register. <span class="smallcode">cvttpd2pi</span> performs
the similar operation, except that truncation is used to round a source values
to integers, rules for operands are the same. <span class="smallcode">cvtsd2si</span> converts a double
precision floating point value into a double word integer, the destination
operand should be a 32-bit general register, the source operand can be a
64-bit memory location or SSE register. <span class="smallcode">cvttsd2si</span> performs the similar
operation, except that truncation is used to round a source value to integer,
rules for operands are the same.
</p>
<p class="smalltext">
<span class="smallcode">cvtps2dq</span> and <span class="smallcode">cvttps2dq</span> convert packed single precision floating point
values to packed four double word integers, storing them in the destination
operand. <span class="smallcode">cvtpd2dq</span> and <span class="smallcode">cvttpd2dq</span> convert packed double precision floating
point values to packed two double word integers, storing the result in the low
quad word of the destination operand. <span class="smallcode">cvtdq2ps</span> converts packed four
double word integers to packed single precision floating point values.
<span class="smallcode">cvtdq2pd</span> converts packed two double word integers from the low quad word
of the source operand to packed double precision floating point values.
For all these instruction destination operand must be a SSE register, the
source operand can be a 128-bit memory location or SSE register.
</p>
<p class="smalltext">
<span class="smallcode">movdqa</span> and <span class="smallcode">movdqu</span> transfer a double quad word operand containing packed
integers from source operand to destination operand. At least one of the
operands have to be a SSE register, the second one can be also a SSE register
or 128-bit memory location. Memory operands for <span class="smallcode">movdqa</span> instruction must be
aligned on boundary of 16 bytes, operands for <span class="smallcode">movdqu</span> instruction don't have
to be aligned.
</p>
<p class="smalltext">
<span class="smallcode">movq2dq</span> moves the contents of the MMX source register to the low quad word
of destination SSE register. <span class="smallcode">movdq2q</span> moves the low quad word from the source
SSE register to the destination MMX register.
</p>
<pre class="smallcode">    movq2dq xmm0,mm1   ; move from MMX register to SSE register
    movdq2q mm0,xmm1   ; move from SSE register to MMX register
</pre>
<p class="smalltext">
All MMX instructions operating on the 64-bit packed integers (those with
mnemonics starting with <span class="smallcode">p</span>) are extended to operate on 128-bit packed
integers located in SSE registers. Additional syntax for these instructions
needs an SSE register where MMX register was needed, and the 128-bit memory
location or SSE register where 64-bit memory location of MMX register were
needed. The exception is <span class="smallcode">pshufw</span> instruction, which doesn't allow extended
syntax, but has two new variants: <span class="smallcode">pshufhw</span> and <span class="smallcode">pshuflw</span>, which allow only
the extended syntax, and perform the same operation as <span class="smallcode">pshufw</span> on the high
or low quad words of operands respectively. Also the new instruction <span class="smallcode">pshufd</span>
is introduced, which performs the same operation as <span class="smallcode">pshufw</span>, but on the
double words instead of words, it allows only the extended syntax.
</p>
<pre class="smallcode">    psubb xmm0,[esi]   ; substract 16 packed bytes
    pextrw eax,xmm0,7  ; extract highest word into eax
</pre>
<p class="smalltext">
<span class="smallcode">paddq</span> performs the addition of packed quad words, <span class="smallcode">psubq</span> performs the
substraction of packed quad words, <span class="smallcode">pmuludq</span> performs an unsigned multiplication
of low double words from each corresponding quad words and returns the results
in packed quad words. These instructions follow the same rules for operands as
the general MMX operations described in <a href="#2.1.14">2.1.14</a>.
</p>
<p class="smalltext">
<span class="smallcode">pslldq</span> and <span class="smallcode">psrldq</span> perform logical shift left or right of the double
quad word in the destination operand by the amount of bits specified in the
source operand. The destination operand should be a SSE register, source
operand should be an 8-bit immediate value.
</p>
<p class="smalltext">
<span class="smallcode">punpckhqdq</span> interleaves the high quad word of the source operand and the
high quad word of the destination operand and writes them to the destination
SSE register. <span class="smallcode">punpcklqdq</span> interleaves the low quad word of the source operand
and the low quad word of the destination operand and writes them to the
destination SSE register. The source operand can be a 128-bit memory location
or SSE register.
</p>
<p class="smalltext">
<span class="smallcode">movntdq</span> stores packed integer data from the SSE register to memory using
non-temporal hint. The source operand should be a SSE register, the
destination operand should be a 128-bit memory location. <span class="smallcode">movntpd</span> stores
packed double precision values from the SSE register to memory using a
non-temporal hint. Rules for operand are the same. <span class="smallcode">movnti</span> stores integer
from a general register to memory using a non-temporal hint. The source
operand should be a 32-bit general register, the destination operand should
be a 32-bit memory location. <span class="smallcode">maskmovdqu</span> stores selected bytes from the first
operand into a 128-bit memory location using a non-temporal hint. Both
operands should be a SSE registers, the second operand selects wich bytes from
the source operand are written to memory. The memory location is pointed by DI
(or EDI) register in the segment selected by DS and does not need to be
aligned.
</p>
<p class="smalltext">
<span class="smallcode">clflush</span> writes and invalidates the cache line associated with the address
of byte specified with the operand, which should be a 8-bit memory location.
</p>
<p class="smalltext">
<span class="smallcode">lfence</span> performs a serializing operation on all instruction loading from
memory that were issued prior to it. <span class="smallcode">mfence</span> performs a serializing operation
on all instruction accesing memory that were issued prior to it, and so it
combines the functions of <span class="smallcode">sfence</span> (described in previous section) and
<span class="smallcode">lfence</span> instructions. These instructions have no operands.
</p>

<p><b>
<a name="2.1.17" class="smalltext">2.1.17  SSE3 instructions</a>
</b></p>

<p class="smalltext">
Prescott technology introduces some new instructions to improve the performance
of SSE and SSE2 - this extension is called SSE3.
</p>
<p class="smalltext">
<span class="smallcode">fisttp</span> behaves like the <span class="smallcode">fistp</span> instruction and accepts the same operands,
the only difference is that it always used truncation, irrespective of the
rounding mode.
</p>
<p class="smalltext">
<span class="smallcode">movshdup</span> loads into destination operand the 128-bit value obtained from
the source value of the same size by filling the each quad word with the two
duplicates of the value in its high double word. <span class="smallcode">movsldup</span> performs the same
action, except it duplicates the values of low double words. The destination
operand should be SSE register, the source operand can be SSE register or
128-bit memory location.
</p>
<p class="smalltext">
<span class="smallcode">movddup</span> loads the 64-bit source value and duplicates it into high and low
quad word of the destination operand. The destination operand should be SSE
register, the source operand can be SSE register or 64-bit memory location.
</p>
<p class="smalltext">
<span class="smallcode">lddqu</span> is functionally equivalent to <span class="smallcode">movdqu</span> instruction with memory as
source operand, but it may improve performance when the source operand crosses
a cacheline boundary. The destination operand has to be SSE register, the source
operand must be 128-bit memory location.
</p>
<p class="smalltext">
<span class="smallcode">addsubps</span> performs single precision addition of second and fourth pairs and
single precision substracion of the first and third pairs of floating point
values in the operands. <span class="smallcode">addsubpd</span> performs double precision addition of the
second pair and double precision substraction of the first pair of floating
point values in the operand. <span class="smallcode">haddps</span> performs the addition of two single
precision values within the each quad word of source and destination operands,
and stores the results of such horizontal addition of values from destination
operand into low quad word of destination operand, and the results from the
source operand into high quad word of destination operand. <span class="smallcode">haddpd</span> performs
the addition of two double precision values within each operand, and stores
the result from destination operand into low quad word of destination operand,
and the result from source operand into high quad word of destination operand.
All these instruction need the destination operand to be SSE register, source
operand can be SSE register or 128-bit memory location.
</p>
<p class="smalltext">
<span class="smallcode">monitor</span> sets up an address range for monitoring of write-back stores. It
need its three operands to be EAX, ECX and EDX register in that order. <span class="smallcode">mwait</span>
waits for a write-back store to the address range set up by the <span class="smallcode">monitor</span>
instruction. It uses two operands with additional parameters, first being the
EAX and second the ECX register.
</p>
<p class="smalltext">
The functionality of SSE3 is further extended by the set of Supplemental
SSE3 instructions (SSSE3). They generally follow the same rules for operands
as all the MMX operations extended by SSE.
</p>
<p class="smalltext">
<span class="smallcode">phaddw</span> and <span class="smallcode">phaddd</span> perform the horizontal additional of the pairs of
adjacent values from both the source and destination operand, and stores the
sums into the destination (sums from the source operand go into lower part of
destination register). They operate on 16-bit or 32-bit chunks, respectively.
<span class="smallcode">phaddsw</span> performs the same operation on signed 16-bit packed values, but the
result of each addition is saturated. <span class="smallcode">phsubw</span> and <span class="smallcode">phsubd</span> analogously
perform the horizontal substraction of 16-bit or 32-bit packed value, and
<span class="smallcode">phsubsw</span> performs the horizontal substraction of signed 16-bit packed values
with saturation.
</p>
<p class="smalltext">
<span class="smallcode">pabsb</span>, <span class="smallcode">pabsw</span> and <span class="smallcode">pabsd</span> calculate the absolute value of each signed
packed signed value in source operand and stores them into the destination
register. They operator on 8-bit, 16-bit and 32-bit elements respectively.
</p>
<p class="smalltext">
<span class="smallcode">pmaddubsw</span> multiplies signed 8-bit values from the source operand with the
corresponding unsigned 8-bit values from the destination operand to produce
intermediate 16-bit values, and every adjacent pair of those intermediate
values is then added horizontally and those 16-bit sums are stored into the
destination operand.
</p>
<p class="smalltext">
<span class="smallcode">pmulhrsw</span> multiplies corresponding 16-bit integers from the source and
destination operand to produce intermediate 32-bit values, and the 16 bits
next to the highest bit of each of those values are then rounded and packed
into the destination operand.
</p>
<p class="smalltext">
<span class="smallcode">pshufb</span> shuffles the bytes in the destination operand according to the
mask provided by source operand - each of the bytes in source operand is
an index of the target position for the corresponding byte in the destination.
</p>
<p class="smalltext">
<span class="smallcode">psignb</span>, <span class="smallcode">psignw</span> and <span class="smallcode">psignd</span> perform the operation on 8-bit, 16-bit or
32-bit integers in destination operand, depending on the signs of the values
in the source. If the value in source is negative, the corresponding value in
the destination register is negated, if the value in source is positive, no
operation is performed on the corresponding value is performed, and if the
value in source is zero, the value in destination is zeroed, too.
</p>
<p class="smalltext">
<span class="smallcode">palignr</span> appends the source operand to the destination operand to form the
intermediate value of twice the size, and then extracts into the destination
register the 64 or 128 bits that are right-aligned to the byte offset
specified by the third operand, which should be an 8-bit immediate value. This
is the only SSSE3 instruction that takes three arguments.
</p>

<p><b>
<a name="2.1.18" class="smalltext">2.1.18  AMD 3DNow! instructions</a>
</b></p>

<p class="smalltext">
The 3DNow! extension adds a new MMX instructions to those described in <a href="#2.1.14">2.1.14</a>,
and introduces operation on the 64-bit packed floating point values, each
consisting of two single precision floating point values.
</p>
<p class="smalltext">
These instructions follow the same rules as the general MMX operations, the
destination operand should be a MMX register, the source operand can be a MMX
register or 64-bit memory location. <span class="smallcode">pavgusb</span> computes the rounded averages
of packed unsigned bytes. <span class="smallcode">pmulhrw</span> performs a signed multiplication of the packed
words, round the high word of each double word results and stores them in the
destination operand. <span class="smallcode">pi2fd</span> converts packed double word integers into
packed floating point values. <span class="smallcode">pf2id</span> converts packed floating point values
into packed double word integers using truncation. <span class="smallcode">pi2fw</span> converts packed
word integers into packed floating point values, only low words of each
double word in source operand are used. <span class="smallcode">pf2iw</span> converts packed floating
point values to packed word integers, results are extended to double words
using the sign extension. <span class="smallcode">pfadd</span> adds packed floating point values. <span class="smallcode">pfsub</span>
and <span class="smallcode">pfsubr</span> substracts packed floating point values, the first one substracts
source values from destination values, the second one substracts destination
values from the source values. <span class="smallcode">pfmul</span> multiplies packed floating point
values. <span class="smallcode">pfacc</span> adds the low and high floating point values of the destination
operand, storing the result in the low double word of destination, and adds
the low and high floating point values of the source operand, storing the
result in the high double word of destination. <span class="smallcode">pfnacc</span> substracts the high
floating point value of the destination operand from the low, storing the
result in the low double word of destination, and substracts the high floating
point value of the source operand from the low, storing the result in the high
double word of destination. <span class="smallcode">pfpnacc</span> substracts the high floating point value
of the destination operand from the low, storing the result in the low double
word of destination, and adds the low and high floating point values of the
source operand, storing the result in the high double word of destination.
<span class="smallcode">pfmax</span> and <span class="smallcode">pfmin</span> compute the maximum and minimum of floating point values.
<span class="smallcode">pswapd</span> reverses the high and low double word of the source operand. <span class="smallcode">pfrcp</span>
returns an estimates of the reciprocals of floating point values from the
source operand, <span class="smallcode">pfrsqrt</span> returns an estimates of the reciprocal square
roots of floating point values from the source operand, <span class="smallcode">pfrcpit1</span> performs
the first step in the Newton-Raphson iteration to refine the reciprocal
approximation produced by <span class="smallcode">pfrcp</span> instruction, <span class="smallcode">pfrsqit1</span> performs the first
step in the Newton-Raphson iteration to refine the reciprocal square root
approximation produced by <span class="smallcode">pfrsqrt</span> instruction, <span class="smallcode">pfrcpit2</span> performs the
second final step in the Newton-Raphson iteration to refine the reciprocal
approximation or the reciprocal square root approximation. <span class="smallcode">pfcmpeq</span>,
<span class="smallcode">pfcmpge</span> and <span class="smallcode">pfcmpgt</span> compare the packed floating point values and sets
all bits or zeroes all bits of the correspoding data element in the
destination operand according to the result of comparision, first checks
whether values are equal, second checks whether destination value is greater
or equal to source value, third checks whether destination value is greater
than source value.
</p>
<p class="smalltext">
<span class="smallcode">prefetch</span> and <span class="smallcode">prefetchw</span> load the line of data from memory that contains
byte specified with the operand into the data cache, <span class="smallcode">prefetchw</span> instruction
should be used when the data in the cache line is expected to be modified,
otherwise the <span class="smallcode">prefetch</span> instruction should be used. The operand should be an
8-bit memory location.
</p>
<p class="smalltext">
<span class="smallcode">femms</span> performs a fast clear of MMX state. This instruction has no
operands.
</p>

<p><b>
<a name="2.1.19" class="smalltext">2.1.19  The x86-64 long mode instructions</a>
</b></p>
<p class="smalltext">
The AMD64 and EM64T architectures (we will use the common name x86-64 for them
both) extend the x86 instruction set for the 64-bit processing. While legacy
and compatibility modes use the same set of registers and instructions, the
new long mode extends the x86 operations to 64 bits and introduces several new
registers. You can turn on generating the code for this mode with the <span class="smallcode">use64</span>
directive.
</p>
<p class="smalltext">
Each of the general purpose registers is extended to 64 bits and the eight
whole new general purpose registers and also eight new SSE registers are added.
See table <a href="#_2.4">2.4</a> for the summary of new registers (only the ones that was not
listed in table <a href="#_1.2">1.2</a>). The general purpose registers of smallers sizes are the
low order portions of the larger ones. You can still access the <span class="smallcode">ah</span>, <span class="smallcode">bh</span>,
<span class="smallcode">ch</span> and <span class="smallcode">dh</span> registers in long mode, but you cannot use them in the same
instruction with any of the new registers.
</p>
<p class="smalltext">
<b><a name="_2.4">Table 2.4  New registers in long mode</a></b>
</p>
<table class="doctable" style="width: 270px;">
  <tr>
    <th style="width: 70px;">Type</th>
    <td colspan="4">General</td>
    <td colspan="4">SSE</td>
  </tr>
  <tr>
    <th style="width: 70px;">Bits</th>
    <td>8</td>
    <td>16</td>
    <td>32</td>
    <td>64</td>
    <td>128</td>
  </tr>
  <tr>
    <td/>
    <td>
      <table class="intable">
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode">spl</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">bpl</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">sil</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">dil</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r8b</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r9b</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r10b</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r11b</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r12b</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r13b</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r14b</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r15b</span></td>
        </tr>
      </table>
    </td>
    <td>
      <table class="intable">
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r8w</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r9w</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r10w</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r11w</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r12w</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r13w</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r14w</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r15w</span></td>
        </tr>
      </table>
    </td>
    <td>
      <table class="intable">
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r8d</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r9d</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r10d</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r11d</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r12d</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r13d</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r14d</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r15d</span></td>
        </tr>
      </table>
    </td>
    <td>
      <table class="intable">
        <tr>
          <td><span class="smallcode">rax</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">rcx</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">rdx</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">rbx</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">rsp</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">rbp</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">rsi</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">rdi</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r8</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r9</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r10</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r11</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r12</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r13</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r14</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">r15</span></td>
        </tr>
      </table>
    </td>
    <td>
      <table class="intable">
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode"> </span></td>
        </tr>
        <tr>
          <td><span class="smallcode">xmm8</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">xmm9</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">xmm10</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">xmm11</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">xmm12</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">xmm13</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">xmm14</span></td>
        </tr>
        <tr>
          <td><span class="smallcode">xmm15</span></td>
        </tr>
      </table>
    </td>
  </tr>
</table>
<p class="smalltext">
In general any instruction from x86 architecture, which allowed 16-bit or
32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit
registers should be used for addressing in long mode, the 32-bit addressing
is also allowed, but it's not possible to use the addresses based on 16-bit
registers. Below are the samples of new operations possible in long mode on the
example of <span class="smallcode">mov</span> instruction:
</p>
<pre class="smallcode">    mov rax,r8   ; transfer 64-bit general register
    mov al,[rbx] ; transfer memory addressed by 64-bit register
</pre>
<p class="smalltext">
The long mode uses also the instruction pointer based addresses, you can
specify it manually with the special RIP register symbol, but such addressing
is also automatically generated by flat assembler, since there is no 64-bit
absolute addressing in long mode. You can still force the assembler to use the
32-bit absolute addressing by putting the <span class="smallcode">dword</span>
size override for address inside the square brackets.
There is also one exception, where the 64-bit absolute addressing is possible,
it's the <span class="smallcode">mov</span> instruction with one of the
operand being accumulator register, and second being the memory operand.
To force the assembler to use the 64-bit absolute addressing there, use the
<span class="smallcode">qword</span> size operator for address inside the square brackets.
When no size operator is applied to address, assembler generates the optimal form
automatically.
</p>
<pre class="smallcode">    mov [qword 0],rax  ; absolute 64-bit addressing
    mov [dword 0],r15d ; absolute 32-bit addressing
    mov [0],rsi        ; automatic RIP-relative addressing
    mov [rip+3],sil    ; manual RIP-relative addressing
</pre>
<p class="smalltext">
Also as the immediate operands for 64-bit operations only the signed 32-bit
values are possible, with the only exception being the <span class="smallcode">mov</span> instruction with
destination operand being 64-bit general purpose register. Trying to force the
64-bit immediate with any other instruction will cause an error.
</p>
<p class="smalltext">
If any operation is performed on the 32-bit general registers in long mode,
the upper 32 bits of the 64-bit registers containing them are filled with
zeros. This is unlike the operations on 16-bit or 8-bit portions of those
registers, which preserve the upper bits.
</p>
<p class="smalltext">
Three new type conversion instructions are available. The <span class="smallcode">cdqe</span> sign extends
the double word in EAX into quad word and stores the result in RAX register.
<span class="smallcode">cqo</span> sign extends the quad word in RAX into double quad word and stores the
extra bits in the RDX register. These instructions have no operands. <span class="smallcode">movsxd</span>
sign extends the double word source operand, being either the 32-bit register
or memory, into 64-bit destination operand, which has to be register.
No analogous instruction is needed for the zero extension, since it is done
automatically by any operations on 32-bit registers, as noted in previous
paragraph. And the <span class="smallcode">movzx</span> and <span class="smallcode">movsx</span> instructions, conforming to the general
rule, can be used with 64-bit destination operand, allowing extension of byte
or word values into quad words.
</p>
<p class="smalltext">
All the binary arithmetic and logical instruction are promoted to allow
64-bit operands in long mode. The use of decimal arithmetic instructions in
long mode prohibited.
</p>
<p class="smalltext">
The stack operations, like <span class="smallcode">push</span> and <span class="smallcode">pop</span> in long mode default to 64-bit
operands and it's not possible to use 32-bit operands with them. The <span class="smallcode">pusha</span>
and <span class="smallcode">popa</span> are disallowed in long mode.
</p>
<p class="smalltext">
The indirect near jumps and calls in long mode default to 64-bit operands and
it's not possible to use the 32-bit operands with them. On the other hand, the
indirect far jumps and calls allow any operands that were allowed by the x86
architecture and also 80-bit memory operand is allowed (though only EM64T seems
to implement such variant), with the first eight bytes defining the offset and
two last bytes specifying the selector. The direct far jumps and calls are not
allowed in long mode.
</p>
<p class="smalltext">
The I/O instructions, <span class="smallcode">in</span>, <span class="smallcode">out</span>, <span class="smallcode">ins</span> and <span class="smallcode">outs</span> are the exceptional
instructions that are not extended to accept quad word operands in long mode.
But all other string operations are, and there are new short forms <span class="smallcode">movsq</span>,
<span class="smallcode">cmpsq</span>, <span class="smallcode">scasq</span>, <span class="smallcode">lodsq</span> and <span class="smallcode">stosq</span> introduced for the variants of string
operations for 64-bit string elements. The RSI and RDI registers are used by
default to address the string elements.
</p>
<p class="smalltext">
The <span class="smallcode">lfs</span>, <span class="smallcode">lgs</span> and <span class="smallcode">lss</span> instructions are extended to accept 80-bit source
memory operand with 64-bit destination register (though only EM64T seems to
implement such variant). The <span class="smallcode">lds</span> and <span class="smallcode">les</span> are disallowed in long mode.
</p>
<p class="smalltext">
The system instructions like <span class="smallcode">lgdt</span> which required the 48-bit memory operand,
in long mode require the 80-bit memory operand.
</p>
<p class="smalltext">
The <span class="smallcode">cmpxchg16b</span> is the 64-bit equivalent of <span class="smallcode">cmpxchg8b</span> instruction, it uses
the double quad word memory operand and 64-bit registers to perform the
analoguous operation.
</p>
<p class="smalltext">
<span class="smallcode">swapgs</span> is the new instruction, which swaps the contents of GS register and
the KernelGSbase model-specific register (MSR address 0C0000102h).
</p>
<p class="smalltext">
<span class="smallcode">syscall</span> and <span class="smallcode">sysret</span> is the pair of new instructions that provide the
functionality similar to <span class="smallcode">sysenter</span> and <span class="smallcode">sysexit</span> in long mode, where the
latter pair is disallowed. The <span class="smallcode">sysexitq</span> and <span class="smallcode">sysretq</span> mnemonics provide the
64-bit versions of <span class="smallcode">sysexit</span> and <span class="smallcode">sysret</span> instructions.
</p>
<p class="smalltext">
The <span class="smallcode">rdmsrq</span> and <span class="smallcode">wrmsrq</span> mnemonics are the 64-bit variants of the <span class="smallcode">rdmsr</span>
and <span class="smallcode">wrmsr</span> instructions.
</p>

<p><b>
<a name="2.1.20" class="smalltext">2.1.20  SSE4 instructions</a>
</b></p>

<p class="smalltext">
There are actually three different sets of instructions under the name SSE4.
Intel designed two of them, SSE4.1 and SSE4.2, with latter extending the
former into the full Intel's SSE4 set. On the other hand, the implementation
by AMD includes only a few instructions from this set, but also contains
some additional instructions, that are called the SSE4a set.
</p>
<p class="smalltext">
The SSE4.1 instructions mostly follow the same rules for operands, as
the basic SSE operations, so they require destination operand to be SSE
register and source operand to be 128-bit memory location or SSE register,
and some operations require a third operand, the 8-bit immediate value.
</p>
<p class="smalltext">
<span class="smallcode">pmulld</span> performs a signed multiplication of the packed double words and
stores the low double words of the results in the destination operand.
<span class="smallcode">pmuldq</span> performs a two signed multiplications of the corresponding double
words in the lower quad words of operands, and stores the results as
packed quad words into the destination register. <span class="smallcode">pminsb</span> and <span class="smallcode">pmaxsb</span>
return the minimum or maximum values of packed signed bytes, <span class="smallcode">pminuw</span> and
<span class="smallcode">pmaxuw</span> return the minimum and maximum values of packed unsigned words,
<span class="smallcode">pminud</span>, <span class="smallcode">pmaxud</span>, <span class="smallcode">pminsd</span> and <span class="smallcode">pmaxsd</span> return minimum or maximum values
of packed unsigned or signed words. These instruction complement the
instructions computing packed minimum or maximum introduced by SSE.
</p>
<p class="smalltext">
<span class="smallcode">ptest</span> sets the ZF flag to one when the result of bitwise AND of the
both operands is zero, and zeroes the ZF otherwise. It also sets CF flag
to one, when the result of bitwise AND of the destination operand with
the bitwise NOT of the source operand is zero, and zeroes the CF otherwise.
<span class="smallcode">pcmpeqq</span> compares packed quad words for equality, and fills the
corresponding elements of destination operand with either ones or zeros,
depending on the result of comparison.
</p>
<p class="smalltext">
<span class="smallcode">paskusdw</span> converts packed signed double words from both the source and
destination operand into the unsigned words using saturation, and stores
the eight resulting word values into the destination register.
</p>
<p class="smalltext">
<span class="smallcode">pmovsxbw</span> and <span class="smallcode">pmovzxbw</span> perform sign extension or zero extension of the
lowest eight byte values from the source operand into packed word values in
destination operand. <span class="smallcode">pmovsxbd</span> and <span class="smallcode">pmovzxbd</span> perform sign extension or zero
extension of the lowest four byte values from the source operand into packed
double word values in destination operand. <span class="smallcode">pmovsxbq</span> and <span class="smallcode">pmovzxbq</span> perform
sign extension or zero extension of the lowest two byte values from the source
operand into packed quad word value in destination operand. <span class="smallcode">pmovsxwd</span> and
<span class="smallcode">pmovzxwd</span> perform sign extension or zero extension of the lowest four word
values from the source operand into packed double words in destination operand.
<span class="smallcode">pmovsxwq</span> and <span class="smallcode">pmovzxwq</span> perform sign extension or zero extension of the
lowest two word values from the source operand into packed quad words in
destination operand. <span class="smallcode">pmovsxdq</span> and <span class="smallcode">pmovzxdq</span> perform signe extension or
zero extension of the lowest two double word values from the source operand
into packed quad words in destination operand.
</p>
<p class="smalltext">
<span class="smallcode">phminposuw</span> finds the minimum unsigned word value in source operand
and places it into the lowest word of destination operand, setting the
remaining upper bits of destination to zero.
</p>
<p class="smalltext">
<span class="smallcode">roundps</span>, <span class="smallcode">roundss</span>, <span class="smallcode">roundpd</span> and <span class="smallcode">roundsd</span> perform the rounding of
packed or individual floating point value of single or double precision,
using the rounding mode specified by the third operand.
</p>
<pre class="smallcode">    roundsd xmm0,xmm1,0011b ; round toward zero
</pre>
<p class="smalltext">
<span class="smallcode">dpps</span> calculates dot product of packed single precision floating point
values, that is it multiplies the corresponding pairs of values from source and
destination operand and then sums the products up. The high four bits of the
8-bit immediate third operand control which products are calculated and taken
to the sum, and the low four bits control, into which elements of destination
the resulting dot product is copied (the other elements are filled with zero).
<span class="smallcode">dppd</span> calculates dot product of packed double precision floating point values.
The bits 4 and 5 of third operand control, which products are calculated and
added, and bits 0 and 1 of this value control, which elements in destination
register should get filled with the result. <span class="smallcode">mpsadbw</span> calculates multiple sums
of absolute differences of unsigned bytes. The third operand controls, with
value in bits 0-1, which of the four-byte blocks in source operand is taken to
calculate the absolute differencies, and with value in bit 2, at which of the
two first four-byte block in destination operand start calculating multiple
sums. The sum is calculated from four absolute differencies between the
corresponding unsigned bytes in the source and destination block, and each next
sum is calculated in the same way, but taking the four bytes from destination
at the position one byte after the position of previous block. The four bytes
from the source stay the same each time. This way eight sums of absolute
differencies are calculated and stored as packed word values into the
destination operand. The instructions described in this paragraph follow the
same reules for operands, as <span class="smallcode">roundps</span> instruction.
</p>
<p class="smalltext">
<span class="smallcode">blendps</span>, <span class="smallcode">blendvps</span>, <span class="smallcode">blendpd</span> and <span class="smallcode">blendvpd</span> conditionally copy the
values from source operand into the destination operand, depending on the bits
of the mask provided by third operand. If a mask bit is set, the corresponding
element of source is copied into the same place in destination, otherwise this
position is destination is left unchanged. The rules for the first two operands
are the same, as for general SSE instructions. <span class="smallcode">blendps</span> and <span class="smallcode">blendpd</span> need
third operand to be 8-bit immediate, and they operate on single or double
precision values, respectively. <span class="smallcode">blendvps</span> and <span class="smallcode">blendvpd</span> require third operand
to be the XMM0 register.
</p>
<pre class="smallcode">    blendvps xmm3,xmm7,xmm0 ; blend according to mask
</pre>
<p class="smalltext">
<span class="smallcode">pblendw</span> conditionally copies word elements from the source operand into the
destination, depending on the bits of mask provided by third operand, which
needs to be 8-bit immediate value. <span class="smallcode">pblendvb</span> conditionally copies byte
elements from the source operands into destination, depending on mask defined
by the third operand, which has to be XMM0 register. These instructions follow
the same rules for operands as <span class="smallcode">blendps</span> and <span class="smallcode">blendvps</span> instructions,
respectively.
</p>
<p class="smalltext">
<span class="smallcode">insertps</span> inserts a single precision floating point value taken from the
position in source operand specified by bits 6-7 of third operand into location
in destination register selected by bits 4-5 of third operand. Additionally,
the low four bits of third operand control, which elements in destination
register will be set to zero. The first two operands follow the same rules as
for the general SSE operation, the third operand should be 8-bit immediate.
</p>
<p class="smalltext">
<span class="smallcode">extractps</span> extracts a single precision floating point value taken from the
location in source operand specified by low two bits of third operand, and
stores it into the destination operand. The destination can be a 32-bit memory
value or general purpose register, the source operand must be SSE register,
and the third operand should be 8-bit immediate value.
</p>
<pre class="smallcode">    extractps edx,xmm3,3 ; extract the highest value
</pre>
<p class="smalltext">
<span class="smallcode">pinsrb</span>, <span class="smallcode">pinsrd</span> and <span class="smallcode">pinsrq</span> copy a byte, double word or quad word from
the source operand into the location of destination operand determined by the
third operand. The destination operand has to be SSE register, the source
operand can be a memory location of appropriate size, or the 32-bit general
purpose register (but 64-bit general purpose register for <span class="smallcode">pinsrq</span>, which is
only available in long mode), and the third operand has to be 8-bit immediate
value. These instructions complement the <span class="smallcode">pinsrw</span> instruction operating on SSE
register destination, which was introduced by SSE2.
</p>
<pre class="smallcode">    pinsrd xmm4,eax,1 ; insert double word into second position
</pre>
<p class="smalltext">
<span class="smallcode">pextrb</span>, <span class="smallcode">pextrw</span>, <span class="smallcode">pextrd</span> and <span class="smallcode">pextrq</span> copy a byte, word, double word or
quad word from the location in source operand specified by third operand, into
the destination. The source operand should be SSE register, the third operand
should be 8-bit immediate, and the destination operand can be memory location
of appropriate size, or the 32-bit general purpose register (but 64-bit general
purpose register for <span class="smallcode">pextrq</span>, which is only available in long mode). The
<span class="smallcode">pextrw</span> instruction with SSE register as source was already introduced by
SSE2, but SSE4 extends it to allow memory operand as destination.
</p>
<pre class="smallcode">    pextrw [ebx],xmm3,7 ; extract highest word into memory
</pre>
<p class="smalltext">
<span class="smallcode">movntdqa</span> loads double quad word from the source operand to the destination
using a non-temporal hint. The destination operand should be SSE register,
and the source operand should be 128-bit memory location.
</p>
<p class="smalltext">
The SSE4.2, described below, adds not only some new operations on SSE
registers, but also introduces some completely new instructions operating on
general purpose registers only.
</p>
<p class="smalltext">
<span class="smallcode">pcmpistri</span> compares two zero-ended (implicit length) strings provided in
its source and destination operand and generates an index stored to ECX;
<span class="smallcode">pcmpistrm</span> performs the same comparison and generates a mask stored to XMM0.
<span class="smallcode">pcmpestri</span> compares two strings of explicit lengths, with length provided
in EAX for the destination operand and in EDX for the source operand, and
generates an index stored to ECX; <span class="smallcode">pcmpestrm</span> performs the same comparision
and generates a mask stored to XMM0. The source and destination operand follow
the same rules as for general SSE instructions, the third operand should be
8-bit immediate value determining the details of performed operation - refer to
Intel documentation for information on those details.
</p>
<p class="smalltext">
<span class="smallcode">pcmpgtq</span> compares packed quad words, and fills the corresponding elements of
destination operand with either ones or zeros, depending on whether the value
in destination is greater than the one in source, or not. This instruction
follows the same rules for operands as <span class="smallcode">pcmpeqq</span>.
</p>
<p class="smalltext">
<span class="smallcode">crc32</span> accumulates a CRC32 value for the source operand starting with
initial value provided by destination operand, and stores the result in
destination. Unless in long mode, the destination operand should be a 32-bit
general purpose register, and the source operand can be a byte, word, or double
word register or memory location. In long mode the destination operand can
also be a 64-bit general purpose register, and the source operand in such case
can be a byte or quad word register or memory location.
</p>
<pre class="smallcode">    crc32 eax,dl          ; accumulate CRC32 on byte value
    crc32 eax,word [ebx]  ; accumulate CRC32 on word value
    crc32 rax,qword [rbx] ; accumulate CRC32 on quad word value
</pre>
<p class="smalltext">
<span class="smallcode">popcnt</span> calculates the number of bits set in the source operand, which can
be 16-bit, 32-bit, or 64-bit general purpose register or memory location,
and stores this count in the destination operand, which has to be register of
the same size as source operand. The 64-bit variant is available only in long
mode.
</p>
<pre class="smallcode">    popcnt ecx,eax ; count bits set to 1
</pre>
<p class="smalltext">
The SSE4a extension, which also includes the <span class="smallcode">popcnt</span> instruction introduced
by SSE4.2, at the same time adds the <span class="smallcode">lzcnt</span> instruction, which follows the
same syntax, and calculates the count of leading zero bits in source operand
(if the source operand is all zero bits, the total number of bits in source
operand is stored in destination).
</p>
<p class="smalltext">
<span class="smallcode">extrq</span> extract the sequence of bits from the low quad word of SSE register
provided as first operand and stores them at the low end of this register,
filling the remaining bits in the low quad word with zeros. The position of bit
string and its length can either be provided with two 8-bit immediate values
as second and third operand, or by SSE register as second operand (and there
is no third operand in such case), which should contain position value in bits
8-13 and length of bit string in bits 0-5.
</p>
<pre class="smallcode">    extrq xmm0,8,7  ; extract 8 bits from position 7
    extrq xmm0,xmm5 ; extract bits defined by register
</pre>
<p class="smalltext">
<span class="smallcode">insertq</span> writes the sequence of bits from the low quad word of the source
operand into specified position in low quad word of the destination operand,
leaving the other bits in low quad word of destination intact. The position
where bits should be written and the length of bit string can either be
provided with two 8-bit immediate values as third and fourth operand, or by
the bit fields in source operand (and there are only two operands in such
case), which should contain position value in bits 72-77 and length of bit
string in bits 64-69.
</p>
<pre class="smallcode">    insertq xmm1,xmm0,4,2 ; insert 4 bits at position 2
    insertq xmm1,xmm0     ; insert bits defined by register
</pre>
<p class="smalltext">
<span class="smallcode">movntss</span> and <span class="smallcode">movntsd</span> store single or double precision floating point
value from the source SSE register into 32-bit or 64-bit destination memory
location respectively, using non-temporal hint.
</p>

<p><b>
<a name="2.1.21" class="smalltext">2.1.21  Other extensions of instruction set</a>
</b></p>
<p class="smalltext">
There is a number of additional instruction set extensions recognized by
flat assembler, and the general syntax of the instructions introduced by those
extensions is provided here. For a detailed information on the operations
performed by them, check out the manuals from Intel (for the VMX and SVM
extensions) or AMD (for the SVM extension).
</p>
<p class="smalltext">
The Virtual-Machine Extensions (VMX) provide a set of instructions for the
management of virtual machines. The <span class="smallcode">vmxon</span> instruction, which enters the VMX
operation, requires a single 64-bit memory operand, which should be a physical
address of memory region, which the logical processor may use to support VMX
operation. The <span class="smallcode">vmxoff</span> instruction, which leaves the VMX operation, has no
operands. The <span class="smallcode">vmlaunch</span> and <span class="smallcode">vmresume</span>, which launch or resume the virtual
machines, and <span class="smallcode">vmcall</span>, which allows guest software to call the VM monitor, use
no operands either.
</p>
<p class="smalltext">
The <span class="smallcode">vmptrld</span> loads the physical address of current Virtual Machine Control
Structure (VMCS) from its memory operand, <span class="smallcode">vmptrst</span> stores the pointer to
current VMCS into address specified by its memory operand, and <span class="smallcode">vmclear</span> sets
the launch state of the VMCS referenced by its memory operand to clear. These
three instruction all require single 64-bit memory operand.
</p>
<p class="smalltext">
The <span class="smallcode">vmread</span> reads from VCMS a field specified by the source operand and
stores it into the destination operand. The source operand should be a
general purpose register, and the destination operand can be a register of
memory. The <span class="smallcode">vmwrite</span> writes into a VMCS field specified by the destination
operand the value provided by source operand. The source operand can be a
general purpose register or memory, and the destination operand must be a
register. The size of operands for those instructions should be 64-bit when
in long mode, and 32-bit otherwise.
</p>
<p class="smalltext">
The <span class="smallcode">invept</span> and <span class="smallcode">invvpid</span> invalidate the translation lookaside buffers
(TLBs) and paging-structure caches, either derived from extended page tables
(EPT), or based on the virtual processor identifier (VPID). These instructions
require two operands, the first one being the general purpose register
specifying the type of invalidation, and the second one being a 128-bit
memory operand providing the invalidation descriptor. The first operand
should be a 64-bit register when in long mode, and 32-bit register otherwise.
</p>
<p class="smalltext">
The Safer Mode Extensions (SMX) provide the functionalities available
throught the <span class="smallcode">getsec</span> instruction. This instruction takes no operands, and
the function that is executed is determined by the contents of EAX register
upon executing this instruction.
</p>
<p class="smalltext">
The Secure Virtual Machine (SVM) is a variant of virtual machine extension
used by AMD. The <span class="smallcode">skinit</span> instruction securely reinitializes the processor
allowing the startup of trusted software, such as the virtual machine monitor
(VMM). This instruction takes a single operand, which must be EAX, and
provides a physical address of the secure loader block (SLB).
</p>
<p class="smalltext">
The <span class="smallcode">vmrun</span> instruction is used to start a guest virtual machine,
its only operand should be an accumulator register (AX, EAX or RAX, the
last one available only in long mode) providing the physical address of the
virtual machine control block (VMCB). The <span class="smallcode">vmsave</span> stores a subset of processor
state into VMCB specified by its operand, and <span class="smallcode">vmload</span> loads the same subset
of processor state from a specified VMCB. The same operand rules as for the
<span class="smallcode">vmrun</span> apply to those two instructions.
</p>
<p class="smalltext">
<span class="smallcode">vmmcall</span> allows the guest software to call the VMM. This instruction takes
no operands.
</p>
<p class="smalltext">
<span class="smallcode">stgi</span> set the global interrupt flag to 1, and <span class="smallcode">clgi</span> zeroes it. These
instructions take no operands.
</p>
<p class="smalltext">
<span class="smallcode">invlpga</span> invalidates the TLB mapping for a virtual page specified by the
first operand (which has to be accumulator register) and address space
identifier specified by the second operand (which must be ECX register).
</p>


<p><b>
<a name="2.2" class="mediumtext">2.2  Control directives</a>
</b></p>

<p class="smalltext">
This section describes the directives that control the assembly process, they
are processed during the assembly and may cause some blocks of instructions
to be assembled differently or not assembled at all.
</p>

<p><b>
<a name="2.2.1" class="smalltext">2.2.1  Numerical constants</a>
</b></p>
<p class="smalltext">
The <span class="smallcode">=</span> directive allows to define the numerical constant. It should be
preceded by the name for the constant and followed by the numerical expression
providing the value. The value of such constants can be a number or an address,
but - unlike labels - the numerical constants are not allowed to hold the
register-based addresses. Besides this difference, in their basic variant
numerical constants behave very much like labels and you can even
forward-reference them (access their values before they actually get defined).
</p>
<p class="smalltext">
There is, however, a second variant of numerical constants, which is
recognized by assembler when you try to define the constant of name, under
which there already was a numerical constant defined. In such case assembler
treats that constant as an assembly-time variable and allows it to be assigned
with new value, but forbids forward-referencing it (for obvious reasons). Let's
see both the variant of numerical constants in one example:
</p>
<pre class="smallcode">    dd sum
    x = 1
    x = x+2
    sum = x
</pre>
<p class="smalltext">
Here the <span class="smallcode">x</span> is an assembly-time variable, and every time it is accessed, the
value that was assigned to it the most recently is used. Thus if we tried to
access the <span class="smallcode">x</span> before it gets defined the first time, like if we wrote <span class="smallcode">dd x</span>
in place of the <span class="smallcode">dd sum</span> instruction, it would cause an error. And when it is
re-defined with the <span class="smallcode">x = x+2</span> directive, the previous value of <span class="smallcode">x</span> is used to
calculate the new one. So when the <span class="smallcode">sum</span> constant gets defined, the <span class="smallcode">x</span> has
value of 3, and this value is assigned to the <span class="smallcode">sum</span>. Since this one is defined
only once in source, it is the standard numerical constant, and can be
forward-referenced. So the <span class="smallcode">dd sum</span> is assembled as <span class="smallcode">dd 3</span>. To read more about
how the assembler is able to resolve this, see section <a href="#2.2.6">2.2.6</a>.
</p>
<p class="smalltext">
The value of numerical constant can be preceded by size operator, which can
ensure that the value will fit in the range for the specified size, and can
affect also how some of the calculations inside the numerical expression are
performed. This example:
</p>
<pre class="smallcode">    c8 = byte -1
    c32 = dword -1
</pre>
<p class="smalltext">
defines two different constants, the first one fits in 8 bits, the second one
fits in 32 bits.
</p>
<p class="smalltext">
When you need to define constant with the value of address, which may be
register-based (and thus you cannot employ numerical constant for this
purpose), you can use the extended syntax of <span class="smallcode">label</span> directive (already
described in section <a href="#1.2.3">1.2.3</a>), like:
</p>
<pre class="smallcode">    label myaddr at ebp+4
</pre>
<p class="smalltext">
which declares label placed at <span class="smallcode">ebp+4</span> address. However remember that labels,
unlike numerical constants, cannot become assembly-time variables.
</p>

<p><b>
<a name="2.2.2" class="smalltext">2.2.2  Conditional assembly</a>
</b></p>
<p class="smalltext">
<span class="smallcode">if</span> directive causes come block of instructions to be assembled only under
certain condition. It should be followed by logical expression specifying the
condition, instructions in next lines will be assembled only when this
condition is met, otherwise they will be skipped. The optional <span class="smallcode">else if</span>
directive followed with logical expression specifying additional condition
begins the next block of instructions that will be assembled if previous
conditions were not met, and the additional condition is met. The optional
<span class="smallcode">else</span> directive begins the block of instructions that will be assembled if
all the conditions were not met. The <span class="smallcode">end if</span> directive ends the last block of
instructions.
</p>
<p class="smalltext">
You should note that <span class="smallcode">if</span> directive is processed at assembly stage and
therefore it doesn't affect any preprocessor directives, like the definitions
of symbolic constants and macroinstructions - when the assembler recognizes the
<span class="smallcode">if</span> directive, all the preprocessing has been already finished.
</p>
<p class="smalltext">
The logical expression consist of logical values and logical operators. The
logical operators are <span class="smallcode">~</span> for logical negation, <span class="smallcode">&amp;</span> for logical and, <span class="smallcode">|</span> for
logical or. The negation has the highest priority. Logical value can be a
numerical expression, it will be false if it is equal to zero, otherwise it
will be true. Two numerical expression can be compared using one of the
following operators to make the logical value: <span class="smallcode">=</span> (equal), <span class="smallcode">&lt;</span> (less),
<span class="smallcode">&gt;</span> (greater), <span class="smallcode">&lt;=</span> (less or equal), <span class="smallcode">&gt;=</span> (greater or equal),
<span class="smallcode">&lt;&gt;</span> (not equal).
</p>
<p class="smalltext">
The <span class="smallcode">used</span> operator followed by a symbol name, is the logical value that
checks whether the given symbol is used somewhere (it returns correct result
even if symbol is used only after this check). The <span class="smallcode">defined</span> operator can be
followed by any expression, usually just by a single symbol name; it checks
whether the given expression contains only symbols that are defined in the
source and accessible from the current position.
</p>
<p class="smalltext">
The following simple example uses the <span class="smallcode">count</span> constant that should be
defined somewhere in source:
</p>
<pre class="smallcode">    if count&gt;0
        mov cx,count
        rep movsb
    end if
</pre>
<p class="smalltext">
These two assembly instructions will be assembled only if the <span class="smallcode">count</span> constant
is greater than 0. The next sample shows more complex conditional structure:
</p>
<pre class="smallcode">    if count &amp; ~ count mod 4
        mov cx,count/4
        rep movsd
    else if count&gt;4
        mov cx,count/4
        rep movsd
        mov cx,count mod 4
        rep movsb
    else
        mov cx,count
        rep movsb
    end if
</pre>
<p class="smalltext">
The first block of instructions gets assembled when the <span class="smallcode">count</span> is non zero and
divisible by four, if this condition is not met, the second logical expression,
which follows the <span class="smallcode">else if</span>, is evaluated and if it's true, the second block
of instructions get assembled, otherwise the last block of instructions, which
follows the line containing only <span class="smallcode">else</span>, is assembled.
</p>
<p class="smalltext">
There are also operators that allow comparison of values being any chains of
symbols. The <span class="smallcode">eq</span> compares two such values whether they are exactly the same.
The <span class="smallcode">in</span> operator checks whether given value is a member of the list of values
following this operator, the list should be enclosed between <span class="smallcode">&lt;</span> and <span class="smallcode">&gt;</span>
characters, its members should be separated with commas. The symbols are
considered the same when they have the same meaning for the assembler - for
example <span class="smallcode">pword</span> and <span class="smallcode">fword</span> for assembler are the same and thus are not
distinguished by the above operators. In the same way <span class="smallcode">16 eq 10h</span> is the true
condition, however <span class="smallcode">16 eq 10+4</span> is not.
</p>
<p class="smalltext">
The <span class="smallcode">eqtype</span> operator checks whether the two compared values have the same
structure, and whether the structural elements are of the same type. The
distinguished types include numerical expressions, individual quoted strings,
floating point numbers, address expressions (the expressions enclosed in square
brackets or preceded by <span class="smallcode">ptr</span> operator), instruction mnemonics, registers, size
operators, jump type and code type operators. And each of the special
characters that act as a separators, like comma or colon, is the separate type
itself. For example, two values, each one consisting of register name followed
by comma and numerical expression, will be regarded as of the same type, no
matter what kind of register and how complicated numerical expression is used;
with exception for the quoted strings and floating point values, which are the
special kinds of numerical expressions and are treated as different types. Thus
<span class="smallcode">eax,16 eqtype fs,3+7</span> condition is true, but <span class="smallcode">eax,16 eqtype eax,1.6</span> is false.
</p>

<p><b>
<a name="2.2.3" class="smalltext">2.2.3  Repeating blocks of instructions</a>
</b></p>
<p class="smalltext">
<span class="smallcode">times</span> directive repeats one instruction specified number of times. It
should be followed by numerical expression specifying number of repeats and
the instruction to repeat (optionally colon can be used to separate number and
instruction). When special symbol <span class="smallcode">%</span> is used inside the instruction, it is
equal to the number of current repeat. For example <span class="smallcode">times 5 db %</span> will define
five bytes with values 1, 2, 3, 4, 5. Recursive use of <span class="smallcode">times</span> directive is
also allowed, so <span class="smallcode">times 3 times % db %</span> will define six bytes with values
1, 1, 2, 1, 2, 3.
</p>
<p class="smalltext">
<span class="smallcode">repeat</span> directive repeats the whole block of instructions. It should be
followed by numerical expression specifying number of repeats. Instructions
to repeat are expected in next lines, ended with the <span class="smallcode">end repeat</span> directive,
for example:
</p>
<pre class="smallcode">    repeat 8
        mov byte [bx],%
        inc bx
    end repeat
</pre>
<p class="smalltext">
The generated code will store byte values from one to eight in the memory
addressed by BX register.
</p>
<p class="smalltext">
Number of repeats can be zero, in that case the instructions are not
assembled at all.
</p>
<p class="smalltext">
The <span class="smallcode">break</span> directive allows to stop repeating earlier and continue assembly
from the first line after the <span class="smallcode">end repeat</span>. Combined with the <span class="smallcode">if</span> directive it
allows to stop repeating under some special condition, like:
</p>
<pre class="smallcode">    s = x/2
    repeat 100
        if x/s = s
            break
        end if
        s = (s+x/s)/2
    end repeat
</pre>
<p class="smalltext">
The <span class="smallcode">while</span> directive repeats the block of instructions as long as the
condition specified by the logical expression following it is true. The block
of instructions to be repeated should end with the <span class="smallcode">end while</span> directive.
Before each repetition the logical expression is evaluated and when its value
is false, the assembly is continued starting from the first line after the
<span class="smallcode">end while</span>. Also in this case the <span class="smallcode">%</span> symbol holds the number of current
repeat. The <span class="smallcode">break</span> directive can be used to stop this kind of loop in the same
way as with <span class="smallcode">repeat</span> directive. The previous sample can be rewritten to use the
<span class="smallcode">while</span> instead of <span class="smallcode">repeat</span> this way:
</p>
<pre class="smallcode">    s = x/2
    while x/s &lt;&gt; s
        s = (s+x/s)/2
        if % = 100
            break
        end if
    end while
</pre>
<p class="smalltext">
The blocks defined with <span class="smallcode">if</span>, <span class="smallcode">repeat</span> and <span class="smallcode">while</span> can be nested in any order,
however they should be closed in the same order in which they were started. The
<span class="smallcode">break</span> directive always stops processing the block that was started last with
either the <span class="smallcode">repeat</span> or <span class="smallcode">while</span> directive.
</p>

<p><b>
<a name="2.2.4" class="smalltext">2.2.4  Addressing spaces</a>
</b></p>
<p class="smalltext">
<span class="smallcode">org</span> directive sets address at which the following code is expected to
appear in memory. It should be followed by numerical expression specifying
the address. This directive begins the new addressing space, the following
code itself is not moved in any way, but all the labels defined within it
and the value of <span class="smallcode">$</span> symbol are affected as if it was put at the given
address. However it's the responsibility of programmer to put the code at
correct address at run-time.
</p>
<p class="smalltext">
The <span class="smallcode">load</span> directive allows to define constant with a binary value loaded
from the already assembled code. This directive should be followed by the name
of the constant, then optionally size operator, then <span class="smallcode">from</span> operator and a
numerical expression specifying a valid address in current addressing space.
The size operator has unusual meaning in this case - it states how many bytes
(up to 8) have to be loaded to form the binary value of constant. If no size
operator is specified, one byte is loaded (thus value is in range from 0 to
255). The loaded data cannot exceed current offset.
</p>
<p class="smalltext">
The <span class="smallcode">store</span> directive can modify the already generated code by replacing
some of the previously generated data with the value defined by given
numerical expression, which follow. The expression can be preceded by the
optional size operator to specify how large value the expression defines, and
therefore how much bytes will be stored, if there is no size operator, the
size of one byte is assumed. Then the <span class="smallcode">at</span> operator and the numerical
expression defining the valid address in current addressing code space, at
which the given value have to be stored should follow. This is a directive for
advanced appliances and should be used carefully.
</p>
<p class="smalltext">
Both <span class="smallcode">load</span> and <span class="smallcode">store</span> directives are limited to operate on places in
current addressing space. The <span class="smallcode">$$</span> symbol is always equal to the base address
of current addressing space, and the <span class="smallcode">$</span> symbol is the address of current
position in that addressing space, therefore these two values define limits
of the area, where <span class="smallcode">load</span> and <span class="smallcode">store</span> can operate.
</p>
<p class="smalltext">
Combining the <span class="smallcode">load</span> and <span class="smallcode">store</span> directives allows to do things like encoding
some of the already generated code. For example to encode the whole code
generated in current addressing space you can use such block of directives:
</p>
<pre class="smallcode">    repeat $-$$
        load a byte from $$+%-1
        store byte a xor c at $$+%-1
    end repeat
</pre>
<p class="smalltext">
and each byte of code will be xored with the value defined by <span class="smallcode">c</span> constant.
</p>
<p class="smalltext">
<span class="smallcode">virtual</span> defines virtual data at specified address. This data won't be
included in the output file, but labels defined there can be used in other
parts of source. This directive can be followed by <span class="smallcode">at</span> operator and the
numerical expression specifying the address for virtual data, otherwise is
uses current address, the same as <span class="smallcode">virtual at $</span>. Instructions defining data
are expected in next lines, ended with <span class="smallcode">end virtual</span> directive. The block of
virtual instructions itself is an independent addressing space, after it's
ended, the context of previous addressing space is restored.
</p>
<p class="smalltext">
The <span class="smallcode">virtual</span> directive can be used to create union of some variables, for
example:
</p>
<pre class="smallcode">    GDTR dp ?
    virtual at GDTR
        GDT_limit dw ?
        GDT_address dd ?
    end virtual
</pre>
<p class="smalltext">
It defines two labels for parts of the 48-bit variable at <span class="smallcode">GDTR</span> address.
</p>
<p class="smalltext">
It can be also used to define labels for some structures addressed by a
register, for example:
</p>
<pre class="smallcode">    virtual at bx
        LDT_limit dw ?
        LDT_address dd ?
    end virtual
</pre>
<p class="smalltext">
With such definition instruction <span class="smallcode">mov ax,[LDT_limit]</span> will be assembled
to <span class="smallcode">mov ax,[bx]</span>.
</p>
<p class="smalltext">
Declaring defined data values or instructions inside the virtual block would
also be useful, because the <span class="smallcode">load</span> directive can be used to load the values
from the virtually generated code into a constants. This directive should be
used after the code it loads but before the virtual block ends, because it can
only load the values from the same addressing space. For example:
</p>
<pre class="smallcode">    virtual at 0
        xor eax,eax
        and edx,eax
        load zeroq dword from 0
    end virtual
</pre>
<p class="smalltext">
The above piece of code will define the <span class="smallcode">zeroq</span> constant containing four bytes
of the machine code of the instructions defined inside the virtual block.
This method can be also used to load some binary value from external file.
For example this code:
</p>
<pre class="smallcode">    virtual at 0
        file 'a.txt':10h,1
        load char from 0
    end virtual
</pre>
<p class="smalltext">
loads the single byte from offset 10h in file <span class="smallcode">a.txt</span> into the <span class="smallcode">char</span>
constant.
</p>
<p class="smalltext">
Any of the <span class="smallcode">section</span> directives described in <a href="#2.4">2.4</a> also begins a new
addressing space.
</p>

<p><b>
<a name="2.2.5" class="smalltext">2.2.5  Other directives</a>
</b></p>
<p class="smalltext">
<span class="smallcode">align</span> directive aligns code or data to the specified boundary. It should
be followed by a numerical expression specifying the number of bytes, to the
multiply of which the current address has to be aligned. The boundary value
has to be the power of two.
</p>
<p class="smalltext">
The <span class="smallcode">align</span> directive fills the bytes that had to be skipped to perform the
alignment with the <span class="smallcode">nop</span> instructions and at the same time marks this area as
uninitialized data, so if it is placed among other uninitialized data that
wouldn't take space in the output file, the alignment bytes will act the same
way. If you need to fill the alignment area with some other values, you can
combine <span class="smallcode">align</span> with <span class="smallcode">virtual</span> to get the size of alignment needed and then
create the alignment yourself, like:
</p>
<pre class="smallcode">    virtual
        align 16
        a = $ - $$
    end virtual
    db a dup 0
</pre>
<p class="smalltext">
The <span class="smallcode">a</span> constant is defined to be the difference between address after alignment
and address of the <span class="smallcode">virtual</span> block (see previous section), so it is equal to
the size of needed alignment space.
</p>
<p class="smalltext">
<span class="smallcode">display</span> directive displays the message at the assembly time. It should
be followed by the quoted strings or byte values, separated with commas. It
can be used to display values of some constants, for example:
</p>
<pre class="smallcode">    bits = 16
    display 'Current offset is 0x'
    repeat bits/4
        d = '0' + $ shr (bits-%*4) and 0Fh
        if d &gt; '9'
            d = d + 'A'-'9'-1
        end if
        display d
    end repeat
    display 13,10
</pre>
<p class="smalltext">
This block of directives calculates the four hexadecimal digits of 16-bit value
and converts them into characters for displaying. Note that this won't work if
the adresses in current addressing space are relocatable (as it might happen with
PE or object output formats), since only absolute values can be used this way.
The absolute value may be obtained by calculating the relative address, like
<span class="smallcode">$-$$</span>, or <span class="smallcode">rva $</span> in case of PE format.
</p>
<p class="smalltext">
The <span class="smallcode">err</span> directive immediately terminates the assembly process when it is
encountered by assembler.
</p>

<p><b>
<a name="2.2.6" class="smalltext">2.2.6  Multiple passes</a>
</b></p>
<p class="smalltext">
Because the assembler allows to reference some of the labels or constants
before they get actually defined, it has to predict the values of such labels
and if there is even a suspicion that prediction failed in at least one case,
it does one more pass, assembling the whole source, this time doing better
prediction based on the values the labels got in the previous pass.
</p>
<p class="smalltext">
The changing values of labels can cause some instructions to have encodings
of different length, and this can cause the change in values of labels again.
And since the labels and constants can also be used inside the expressions that
affect the behavior of control directives, the whole block of source can be
processed completely differently during the new pass. Thus the assembler does
more and more passes, each time trying to do better predictions to approach
the final solution, when all the values get predicted correctly. It uses
various method for predicting the values, which has been chosen to allow
finding in a few passes the solution of possibly smallest length for the most
of the programs.
</p>
<p class="smalltext">
Some of the errors, like the values not fitting in required boundaries, are
not signaled during those intermediate passes, since it may happen that when
some of the values are predicted better, these errors will disappear. However
if assembler meets some illegal syntax construction or unknown instruction, it
always stops immediately. Also defining some label more than once causes such
error, because it makes the predictions groundless.
</p>
<p class="smalltext">
Only the messages created
with the <span class="smallcode">display</span> directive during the last performed pass get actually
displayed. In case when the assembly has been
stopped due to an error, these messages may reflect the predicted values that
are not yet resolved correctly.
</p>
<p class="smalltext">
The solution may sometimes not exist and in such cases the assembler will
never manage to make correct predictions - for this reason there is a limit for
a number of passes, and when assembler reaches this limit, it stops and displays
the message that it is not able to generate the correct output. Consider the
following example:
</p>
<pre class="smallcode">    if ~ defined alpha
        alpha:
    end if
</pre>
<p class="smalltext">
The <span class="smallcode">defined</span> operator gives the true value when the expression following it
could be calculated in this place, what in this case means that the <span class="smallcode">alpha</span>
label is defined somewhere. But the above block causes this label to be defined
only when the value given by <span class="smallcode">defined</span> operator is false, what leads to an
antynomy and makes it impossible to resolve such code. When processing the <span class="smallcode">if</span>
directive assembler has to predict whether the <span class="smallcode">alpha</span> label will be defined
somewhere (it wouldn't have to predict only if the label was already defined
earlier in this pass), and whatever the prediction is, the opposite always
happens. Thus the assembly will fail, unless the <span class="smallcode">alpha</span> label is defined
somewhere in source preceding the above block of instructions - in such case,
as it was already noted, the prediction is not needed and the block will just
get skipped.
</p>
<p class="smalltext">
The above sample might have been written as a try to define the label only
when it was not yet defined. It fails, because the <span class="smallcode">defined</span> operator does
check whether the label is defined anywhere, and this includes the definition
inside this conditionally processed block. However adding some additional
condition may make it possible to get it resolved:
</p>
<pre class="smallcode">    if ~ defined alpha | defined @f
        alpha:
        @@:
    end if
</pre>
<p class="smalltext">
The <span class="smallcode">@f</span> is always the same label as the nearest <span class="smallcode">@@</span> symbol in the source
following it, so the above sample would mean the same if any unique name was
used instead of the anonymous label. When <span class="smallcode">alpha</span> is not defined in any other
place in source, the only possible solution is when this block gets defined,
and this time this doesn't lead to the antynomy, because of the anonymous
label which makes this block self-establishing. To better understand this,
look at the blocks that has nothing more than this self-establishing:
</p>
<pre class="smallcode">    if defined @f
        @@:
    end if
</pre>
<p class="smalltext">
This is an example of source that may have more than one solution, as both
cases when this block gets processed or not are equally correct. Which one of
those two solutions we get depends on the algorithm on the assembler, in case
of flat assembler - on the algorithm of predictions. Back to the previous
sample, when <span class="smallcode">alpha</span> is not defined anywhere else, the condition for <span class="smallcode">if</span> block
cannot be false, so we are left with only one possible solution, and we can
hope the assembler will arrive at it. On the other hand, when <span class="smallcode">alpha</span> is
defined in some other place, we've got two possible solutions again, but one of
them causes <span class="smallcode">alpha</span> to be defined twice, and such an error causes assembler to
abort the assembly immediately, as this is the kind of error that deeply
disturbs the process of resolving. So we can get such source either correctly
resolved or causing an error, and what we get may depend on the internal
choices made by the assembler.
</p>
<p class="smalltext">
However there are some facts about such choices that are certain. When
assembler has to check whether the given symbol is defined and it was already
defined in the current pass, no prediction is needed - it was already noted
above. And when the given symbol has been defined never before, including all
the already finished passes, the assembler predicts it to be not defined.
Knowing this, we can expect that the simple self-establishing block shown
above will not be assembled at all and that the previous sample will resolve
correctly when <span class="smallcode">alpha</span> is defined somewhere before our conditional block,
while it will itself define <span class="smallcode">alpha</span> when it's not already defined earlier, thus
potentially causing the error because of double definition if the <span class="smallcode">alpha</span> is
also defined somewhere later.
</p>
<p class="smalltext">
The <span class="smallcode">used</span> operator may be expected to behave in a similar manner in
analogous cases, however any other kinds of predictions my not be so simple and
you should never rely on them this way.
</p>
<p class="smalltext">
The <span class="smallcode">err</span> directive, usually used to stop the assembly when some condition is
met, stops the assembly immediatelly, regardless of whether the currect pass
is final or intermediate. So even when the condition that caused this directive
to be interpreted is temporary, and would eventually disappear in the later
passes, the assembly is stopped anyway. If it's needed to stop the assembly only
when the condition is permanent, and not just occuring in the intermediate
assembly passes, the trick with <span class="smallcode">rb -1</span> can be used instead. The <span class="smallcode">rb</span> directive
does not cause an error when it is provided with negative value in the
intermediate passes.
</p>


<p><b>
<a name="2.3" class="mediumtext">2.3  Preprocessor directives</a>
</b></p>

<p class="smalltext">
All preprocessor directives are processed before the main assembly process,
and therefore are not affected by the control directives. At this time also
all comments are stripped out.
</p>

<p><b>
<a name="2.3.1" class="smalltext">2.3.1  Including source files</a>
</b></p>

<p class="smalltext">
<span class="smallcode">include</span> directive includes the specified source file at the position where
it is used. It should be followed by the quoted name of file that should be
included, for example:
</p>
<pre class="smallcode">    include 'macros.inc'
</pre>
<p class="smalltext">
The whole included file is preprocessed before preprocessing the lines next
to the line containing the <span class="smallcode">include</span> directive. There are no limits to the
number of included files as long as they fit in memory.
</p>
<p class="smalltext">
The quoted path can contain environment variables enclosed within <span class="smallcode">%</span>
characters, they will be replaced with their values inside the path, both the
<span class="smallcode">\</span> and <span class="smallcode">/</span> characters are allowed as a path separators.
If no absolute path is given, the file is first searched for in the directory containing file
which included it and when it's not found there, in the directory containing
the main source file (the one specified in command line). These rules concern
also paths given with the <span class="smallcode">file</span> directive.
</p>

<p><b>
<a name="2.3.2" class="smalltext">2.3.2  Symbolic constants</a>
</b></p>

<p class="smalltext">
The symbolic constants are different from the numerical constants, before the
assembly process they are replaced with their values everywhere in source
lines after their definitions, and anything can become their values.
</p>
<p class="smalltext">
The definition of symbolic constant consists of name of the constant
followed by the <span class="smallcode">equ</span> directive. Everything that follows this directive will
become the value of constant. If the value of symbolic constant contains
other symbolic constants, they are replaced with their values before assigning
this value to the new constant. For example:
</p>
<pre class="smallcode">    d equ dword
    NULL equ d 0
    d equ edx
</pre>
<p class="smalltext">
After these three definitions the value of <span class="smallcode">NULL</span> constant is <span class="smallcode">dword 0</span> and
the value of <span class="smallcode">d</span> is <span class="smallcode">edx</span>. So, for example, <span class="smallcode">push NULL</span> will be assembled as
<span class="smallcode">push dword 0</span> and <span class="smallcode">push d</span> will be assembled as <span class="smallcode">push edx</span>.
And if then the following line was put:
</p>
<pre class="smallcode">    d equ d,eax
</pre>
<p class="smalltext">
the <span class="smallcode">d</span> constant would get the new value of <span class="smallcode">edx,eax</span>. This way the growing
lists of symbols can be defined.
</p>
<p class="smalltext">
<span class="smallcode">restore</span> directive allows to get back previous value of redefined symbolic
constant. It should be followed by one more names of symbolic constants,
separated with commas. So <span class="smallcode">restore d</span> after the above definitions will give
<span class="smallcode">d</span> constant back the value <span class="smallcode">edx</span>, the second one will restore it to value
<span class="smallcode">dword</span>, and one more will revert <span class="smallcode">d</span> to original meaning as if no such
constant was defined. If there was no constant defined of given name,
<span class="smallcode">restore</span> won't cause an error, it will be just ignored.
</p>
<p class="smalltext">
Symbolic constant can be used to adjust the syntax of assembler to personal
preferences. For example the following set of definitions provides the handy
shortcuts for all the size operators:
</p>
<pre class="smallcode">    b equ byte
    w equ word
    d equ dword
    p equ pword
    f equ fword
    q equ qword
    t equ tword
    x equ dqword
</pre>
<p class="smalltext">
Because symbolic constant may also have an empty value, it can be used to
allow the syntax with <span class="smallcode">offset</span> word before any address value:
</p>
<pre class="smallcode">    offset equ
</pre>
<p class="smalltext">
After this definition <span class="smallcode">mov ax,offset char</span> will be valid construction for
copying the offset of <span class="smallcode">char</span> variable into <span class="smallcode">ax</span> register, because <span class="smallcode">offset</span> is
replaced with an empty value, and therefore ignored.
</p>
<p class="smalltext">
The <span class="smallcode">define</span> directive followed by the name of constant and then the value,
is the alternative way of defining symbolic constant. The only difference
between <span class="smallcode">define</span> and <span class="smallcode">equ</span> is that
<span class="smallcode">define</span> assigns the value as it is,
it does not replace the symbolic constants with their values inside it.
</p>
<p class="smalltext">
Symbolic constants can also be defined with the <span class="smallcode">fix</span> directive, which has
the same syntax as <span class="smallcode">equ</span>, but defines constants of high priority - they are
replaced with their symbolic values even before processing the preprocessor
directives and macroinstructions, the only exception is <span class="smallcode">fix</span> directive
itself, which has the highest possible priority, so it allows redefinition of
constants defined this way.
</p>
<p class="smalltext">
The <span class="smallcode">fix</span> directive can be used for syntax adjustments related to directives
of preprocessor, what cannot be done with <span class="smallcode">equ</span> directive. For example:
</p>
<pre class="smallcode">    incl fix include
</pre>
<p class="smalltext">
defines a short name for <span class="smallcode">include</span> directive, while the similar definition done
with <span class="smallcode">equ</span> directive wouldn't give such result, as standard symbolic constants
are replaced with their values after searching the line for preprocessor
directives.
</p>

<p><b>
<a name="2.3.3" class="smalltext">2.3.3  Macroinstructions</a>
</b></p>

<p class="smalltext">
<span class="smallcode">macro</span> directive allows you to define your own complex instructions, called
macroinstructions, using which can greatly simplify the process of
programming. In its simplest form it's similar to symbolic constant
definition. For example the following definition defines a shortcut for the
<span class="smallcode">test al,0xFF</span> instruction:
</p>
<pre class="smallcode">    macro tst {test al,0xFF}
</pre>
<p class="smalltext">
After the <span class="smallcode">macro</span> directive there is a name of macroinstruction and then its
contents enclosed between the <span class="smallcode">{</span> and <span class="smallcode">}</span> characters. You can use <span class="smallcode">tst</span>
instruction anywhere after this definition and it will be assembled as
<span class="smallcode">test al,0xFF</span>. Defining symbolic constant <span class="smallcode">tst</span> of that value would give the
similar result, but the difference is that the name of macroinstruction is
recognized only as an instruction mnemonic. Also, macroinstructions are
replaced with corresponding code even before the symbolic constants are
replaced with their values. So if you define macroinstruction and symbolic
constant of the same name, and use this name as an instruction mnemonic, it
will be replaced with the contents of macroinstruction, but it will be
replaced with value if symbolic constant if used somewhere inside the
operands.
</p>
<p class="smalltext">
The definition of macroinstruction can consist of many lines, because
<span class="smallcode">{</span> and <span class="smallcode">}</span> characters don't have to be in the same line as <span class="smallcode">macro</span> directive.
For example:
</p>
<pre class="smallcode">    macro stos0
     {
        xor al,al
        stosb
     }
</pre>
<p class="smalltext">
The macroinstruction <span class="smallcode">stos0</span> will be replaced with these two assembly
instructions anywhere it's used.
</p>
<p class="smalltext">
Like instructions which needs some number of operands, the macroinstruction
can be defined to need some number of arguments separated with commas. The
names of needed argument should follow the name of macroinstruction in the
line of <span class="smallcode">macro</span> directive and should be separated with commas if there is more
than one. Anywhere one of these names occurs in the contents of
macroinstruction, it will be replaced with corresponding value, provided when
the macroinstruction is used. Here is an example of a macroinstruction that
will do data alignment for binary output format:
</p>
<pre class="smallcode">    macro align value { rb (value-1)-($+value-1) mod value }
</pre>
<p class="smalltext">
When the <span class="smallcode">align 4</span> instruction is found after this macroinstruction is
defined, it will be replaced with contents of this macroinstruction, and the
<span class="smallcode">value</span> will there become 4, so the result will be <span class="smallcode">rb (4-1)-($+4-1) mod 4</span>.
</p>
<p class="smalltext">
If a macroinstruction is defined that uses an instruction with the same name
inside its definition, the previous meaning of this name is used. Useful
redefinition of macroinstructions can be done in that way, for example:
</p>
<pre class="smallcode">    macro mov op1,op2
     {
      if op1 in &lt;ds,es,fs,gs,ss&gt; &amp; op2 in &lt;cs,ds,es,fs,gs,ss&gt;
        push  op2
        pop   op1
      else
        mov   op1,op2
      end if
     }
</pre>
<p class="smalltext">
This macroinstruction extends the syntax of <span class="smallcode">mov</span> instruction, allowing both
operands to be segment registers. For example <span class="smallcode">mov ds,es</span> will be assembled as
<span class="smallcode">push es</span> and <span class="smallcode">pop ds</span>. In all other cases the standard <span class="smallcode">mov</span> instruction will
be used. The syntax of this <span class="smallcode">mov</span> can be extended further by defining next
macroinstruction of that name, which will use the previous macroinstruction:
</p>
<pre class="smallcode">    macro mov op1,op2,op3
     {
      if op3 eq
        mov   op1,op2
      else
        mov   op1,op2
        mov   op2,op3
      end if
     }
</pre>
<p class="smalltext">
It allows <span class="smallcode">mov</span> instruction to have three operands, but it can still have two
operands only, because when macroinstruction is given less arguments than it
needs, the rest of arguments will have empty values. When three operands are
given, this macroinstruction will become two macroinstructions of the previous
definition, so <span class="smallcode">mov es,ds,dx</span> will be assembled as <span class="smallcode">push ds</span>, <span class="smallcode">pop es</span> and
<span class="smallcode">mov ds,dx</span>.
</p>
<p class="smalltext">
By placing the <span class="smallcode">*</span> after the name of argument you can mark the argument as
required - preprocessor won't allow it to have an empty value. For example the
above macroinstruction could be declared as <span class="smallcode">macro mov op1*,op2*,op3</span> to make
sure that first two arguments will always have to be given some non empty
values.
</p>
<p class="smalltext">
When it's needed to provide macroinstruction with argument that contains
some commas, such argument should be enclosed between <span class="smallcode">&lt;</span> and <span class="smallcode">&gt;</span> characters.
If it contains more than one <span class="smallcode">&lt;</span> character, the same number of <span class="smallcode">&gt;</span> should be
used to tell that the value of argument ends.
</p>
<p class="smalltext">
<span class="smallcode">purge</span> directive allows removing the last definition of specified
macroinstruction. It should be followed by one or more names of
macroinstructions, separated with commas. If such macroinstruction has not
been defined, you won't get any error. For example after having the syntax of
<span class="smallcode">mov</span> extended with the macroinstructions defined above, you can disable
syntax with three operands back by using <span class="smallcode">purge mov</span> directive. Next
<span class="smallcode">purge mov</span> will disable also syntax for two operands being segment registers,
and all the next such directives will do nothing.
</p>
<p class="smalltext">
If after the <span class="smallcode">macro</span> directive you enclose some group of arguments' names in
square brackets, it will allow giving more values for this group of arguments
when using that macroinstruction. Any more argument given after the last
argument of such group will begin the new group and will become the first
argument of it. That's why after closing the square bracket no more argument
names can follow. The contents of macroinstruction will be processed for each
such group of arguments separately. The simplest example is to enclose one
argument name in square brackets:
</p>
<pre class="smallcode">    macro stoschar [char]
     {
        mov al,char
        stosb
     }
</pre>
<p class="smalltext">
This macroinstruction accepts unlimited number of arguments, and each one
will be processed into these two instructions separately. For example
<span class="smallcode">stoschar 1,2,3</span> will be assembled as the following instructions:
</p>
<pre class="smallcode">    mov al,1
    stosb
    mov al,2
    stosb
    mov al,3
    stosb
</pre>
<p class="smalltext">
There are some special directives available only inside the definitions of
macroinstructions. <span class="smallcode">local</span> directive defines local names, which will be
replaced with unique values each time the macroinstruction is used. It should
be followed by names separated with commas. If the name given as parameter to <span class="smallcode">local</span> directive begins with a dot or two
dots, the unique labels generated by each evaluation of macroinstruction will
have the same properties. This directive is usually needed
for the constants or labels that macroinstruction defines and uses internally.
For example:
</p>
<pre class="smallcode">    macro movstr
     {
        local move
      move:
        lodsb
        stosb
        test al,al
        jnz move
     }
</pre>
<p class="smalltext">
Each time this macroinstruction is used, <span class="smallcode">move</span> will become other unique name
in its instructions, so you won't get an error you normally get when some
label is defined more than once.
</p>
<p class="smalltext">
<span class="smallcode">forward</span>, <span class="smallcode">reverse</span> and <span class="smallcode">common</span> directives divide macroinstruction into
blocks, each one processed after the processing of previous is finished. They
differ in behavior only if macroinstruction allows multiple groups of
arguments. Block of instructions that follows <span class="smallcode">forward</span> directive is processed
for each group of arguments, from first to last - exactly like the default
block (not preceded by any of these directives). Block that follows <span class="smallcode">reverse</span>
directive is processed for each group of argument in reverse order - from last
to first. Block that follows <span class="smallcode">common</span> directive is processed only once,
commonly for all groups of arguments. Local name defined in one of the blocks
is available in all the following blocks when processing the same group of
arguments as when it was defined, and when it is defined in common block it is
available in all the following blocks not depending on which group of
arguments is processed.
</p>
<p class="smalltext">
Here is an example of macroinstruction that will create the table of
addresses to strings followed by these strings:
</p>
<pre class="smallcode">    macro strtbl name,[string]
     {
      common
        label name dword
      forward
        local label
        dd label
      forward
        label db string,0
     }
</pre>
<p class="smalltext">
First argument given to this macroinstruction will become the label for table
of addresses, next arguments should be the strings. First block is processed
only once and defines the label, second block for each string declares its
local name and defines the table entry holding the address to that string.
Third block defines the data of each string with the corresponding label.
</p>
<p class="smalltext">
The directive starting the block in macroinstruction can be followed by the
first instruction of this block in the same line, like in the following
example:
</p>
<pre class="smallcode">    macro stdcall proc,[arg]
     {
      reverse push arg
      common call proc
     }
</pre>
<p class="smalltext">
This macroinstruction can be used for calling the procedures using STDCALL
convention, arguments are pushed on stack in the reverse order. For example
<span class="smallcode">stdcall foo,1,2,3</span> will be assembled as:
</p>
<pre class="smallcode">    push 3
    push 2
    push 1
    call foo
</pre>
<p class="smalltext">
If some name inside macroinstruction has multiple values (it is either one
of the arguments enclosed in square brackets or local name defined in the
block following <span class="smallcode">forward</span> or <span class="smallcode">reverse</span> directive) and is used in block
following the <span class="smallcode">common</span> directive, it will be replaced with all of its values,
separated with commas. For example the following macroinstruction will pass
all of the additional arguments to the previously defined <span class="smallcode">stdcall</span>
macroinstruction:
</p>
<pre class="smallcode">    macro invoke proc,[arg]
     { common stdcall [proc],arg }
</pre>
<p class="smalltext">
It can be used to call indirectly (by the pointer stored in memory) the
procedure using STDCALL convention.
</p>
<p class="smalltext">
Inside macroinstruction also special operator <span class="smallcode">#</span> can be used. This
operator causes two names to be concatenated into one name. It can be useful,
because it's done after the arguments and local names are replaced with their
values. The following macroinstruction will generate the conditional jump
according to the <span class="smallcode">cond</span> argument:
</p>
<pre class="smallcode">    macro jif op1,cond,op2,label
     {
        cmp op1,op2
        j#cond label
     }
</pre>
<p class="smalltext">
For example <span class="smallcode">jif ax,ae,10h,exit</span> will be assembled as <span class="smallcode">cmp ax,10h</span> and
<span class="smallcode">jae exit</span> instructions.
</p>
<p class="smalltext">
The <span class="smallcode">#</span> operator can be also used to concatenate two quoted strings into one.
Also conversion of name into a quoted string is possible, with the <span class="smallcode">`</span> operator,
which likewise can be used inside the macroinstruction. It converts the name
that follows it into a quoted string - but note, that when it is followed by
a macro argument which is being replaced with value containing more than one
symbol, only the first of them will be converted, as the <span class="smallcode">`</span> operator converts
only one symbol that immediately follows it. Here's an example of utilizing
those two features:
</p>
<pre class="smallcode">    macro label name
     {
        label name
        if ~ used name
          display `name # " is defined but not used.",13,10
        end if
     }
</pre>
<p class="smalltext">
When label defined with such macro is not used in the source, macro will warn
you with the message, informing to which label it applies.
</p>
<p class="smalltext">
To make macroinstruction behaving differently when some of the arguments are
of some special type, for example a quoted strings, you can use <span class="smallcode">eqtype</span>
comparision operator. Here's an example of utilizing it to distinguish a
quoted string from an other argument:
</p>
<pre class="smallcode">    macro message arg
     {
      if arg eqtype ""
        local str
        jmp   @f
        str   db arg,0Dh,0Ah,24h
        @@:
        mov   dx,str
      else
        mov   dx,arg
      end if
        mov   ah,9
        int   21h
     }
</pre>
<p class="smalltext">
The above macro is designed for displaying messages in DOS programs. When the
argument of this macro is some number, label, or variable, the string from
that address is displayed, but when the argument is a quoted string, the
created code will display that string followed by the carriage return and
line feed.
</p>
<p class="smalltext">
It is also possible to put a declaration of macroinstruction inside another
macroinstruction, so one macro can define another, but there is a problem
with such definitions caused by the fact, that <span class="smallcode">}</span> character cannot occur
inside the macroinstruction, as it always means the end of definition. To
overcome this problem, the escaping of symbols inside macroinstruction can be
used. This is done by placing one or more backslashes in
front of any other symbol (even the special character). Preprocessor sees such
sequence as a single symbol, but each time it meets such symbol during the
macroinstruction processing, it cuts the backslash character from the front of
it. For example <span class="smallcode">\}</span> is treated as single symbol, but during processing of the
macroinstruction it becomes the <span class="smallcode">}</span> symbol. This allows to put one definition
of macroinstruction inside another:
</p>
<pre class="smallcode">    macro ext instr
     {
      macro instr op1,op2,op3
       \{
        if op3 eq
          instr op1,op2
        else
          instr op1,op2
          instr op2,op3
        end if
       \}
     }

    ext add
    ext sub
</pre>
<p class="smalltext">
The macro <span class="smallcode">ext</span> is defined correctly, but when it is used, the <span class="smallcode">\{</span> and <span class="smallcode">\}</span>
become the <span class="smallcode">{</span> and <span class="smallcode">}</span> symbols. So when the <span class="smallcode">ext add</span> is processed, the
contents of macro becomes valid definition of a macroinstruction and this way
the<span class="smallcode">add</span> macro becomes defined. In the same way<span class="smallcode">ext sub</span> defines the<span class="smallcode">sub</span>
macro. The use of <span class="smallcode">\{</span> symbol wasn't really necessary here, but it's done this
way to make the definition more clear.
</p>
<p class="smalltext">
If some directives specific to macroinstructions, like <span class="smallcode">local</span> or <span class="smallcode">common</span>
are needed inside some macro embedded this way, they can be escaped in the same
way. Escaping the symbol with more than one backslash is also allowed, which
allows multiple levels of nesting the macroinstruction definitions.
</p>
<p class="smalltext">
The another technique for defining one macroinstruction by another is to
use the <span class="smallcode">fix</span> directive, which becomes useful when some macroinstruction only
begins the definition of another one, without closing it. For example:
</p>
<pre class="smallcode">    macro tmacro [params]
     {
      common macro params {
     }

    MACRO fix tmacro
    ENDM fix }
</pre>
<p class="smalltext">
defines an alternative syntax for defining macroinstructions, which looks like:
</p>
<pre class="smallcode">    MACRO stoschar char
        mov al,char
        stosb
    ENDM
</pre>
<p class="smalltext">
Note that symbol that has such customized definition must be defined with <span class="smallcode">fix</span>
directive, because only the prioritized symbolic constants are processed before
the preprocessor looks for the <span class="smallcode">}</span> character while defining the macro. This
might be a problem if one needed to perform some additional tasks one the end
of such definition, but there is one more feature which helps in such cases.
Namely it is possible to put any directive, instruction or  macroinstruction
just after the <span class="smallcode">}</span> character that ends the macroinstruction and it will be
processed in the same way as if it was put in the next line.
</p>

<p><b>
<a name="2.3.4" class="smalltext">2.3.4  Structures</a>
</b></p>

<p class="smalltext">
<span class="smallcode">struc</span> directive is a special variant of <span class="smallcode">macro</span> directive that is used to
define data structures. Macroinstruction defined using the <span class="smallcode">struc</span> directive
must be preceded by a label (like the data definition directive) when it's
used. This label will be also attached at the beginning of every name starting
with dot in the contents of macroinstruction. The macroinstruction defined
using the <span class="smallcode">struc</span> directive can have the same name as some other
macroinstruction defined using the <span class="smallcode">macro</span> directive, structure
macroinstruction won't prevent the standard macroinstruction being processed
when there is no label before it and vice versa. All the rules and features concerning
standard macroinstructions apply to structure macroinstructions.
</p>
<p class="smalltext">
Here is the sample of structure macroinstruction:
</p>
<pre class="smallcode">    struc point x,y
     {
        .x dw x
        .y dw y
     }
</pre>
<p class="smalltext">
For example <span class="smallcode">my point 7,11</span> will define structure labeled <span class="smallcode">my</span>, consisting of
two variables: <span class="smallcode">my.x</span> with value 7 and <span class="smallcode">my.y</span> with value 11.
</p>
<p class="smalltext">
If somewhere inside the definition of structure the name consisting of a
single dot it found, it is replaced by the name of the label for the given
instance of structure and this label will not be defined automatically in
such case, allowing to completely customize the definition. The following
example utilizes this feature to extend the data definition directive <span class="smallcode">db</span>
with ability to calculate the size of defined data:
</p>
<pre class="smallcode">    struc db [data]
     {
       common
        . db data
        .size = $ - .
     }
</pre>
<p class="smalltext">
With such definition <span class="smallcode">msg db 'Hello!',13,10</span> will define also
<span class="smallcode">msg.size</span> constant, equal to the size of defined data in bytes.
</p>
<p class="smalltext">
Defining data structures addressed by registers or absolute values should be
done using the <span class="smallcode">virtual</span> directive with structure macroinstruction
(see <a href="#2.2.4">2.2.4</a>).
</p>
<p class="smalltext">
<span class="smallcode">restruc</span> directive removes the last definition of the structure, just like
<span class="smallcode">purge</span> does with macroinstructions and <span class="smallcode">restore</span> with symbolic constants.
It also has the same syntax - should be followed by one or more names of
structure macroinstructions, separated with commas.
</p>

<p><b>
<a name="2.3.5" class="smalltext">2.3.5  Repeating macroinstructions</a>
</b></p>

<p class="smalltext">
The <span class="smallcode">rept</span> directive is a special kind of macroinstruction, which makes given
amount of duplicates of the block enclosed with braces. The basic syntax is
<span class="smallcode">rept</span> directive followed by number (it cannot be an expression, since
preprocessor doesn't do calculations, if you need repetitions based on values
calculated by assembler, use one of the code repeating directives that are
processed by assembler, see <a href="#2.2.3">2.2.3</a>), and then block of source enclosed between
the <span class="smallcode">{</span> and <span class="smallcode">}</span> characters. The simplest example:
</p>
<pre class="smallcode">    rept 5 { in al,dx }
</pre>
<p class="smalltext">
will make five duplicates of the <span class="smallcode">in al,dx</span> line. The block of instructions
is defined in the same way as for the standard macroinstruction and any
special operators and directives which can be used only inside
macroinstructions are also allowed here. When the given count is zero, the
block is simply skipped, as if you defined macroinstruction but never used
it. The number of repetitions can be followed by the name of counter symbol, which will get replaced
symbolically with the number of duplicate currently generated. So this:
</p>
<pre class="smallcode">    rept 3 counter
     {
        byte#counter db counter
     }
</pre>
<p class="smalltext">
will generate lines:
</p>
<pre class="smallcode">    byte1 db 1
    byte2 db 2
    byte3 db 3
</pre>
<p class="smalltext">
The repetition mechanism applied to <span class="smallcode">rept</span> blocks is the same as the one used
to process multiple groups of arguments for macroinstructions, so directives
like <span class="smallcode">forward</span>, <span class="smallcode">common</span> and <span class="smallcode">reverse</span> can be used in their usual meaning.
Thus such macroinstruction:
</p>
<pre class="smallcode">    rept 7 num { reverse display `num }
</pre>
<p class="smalltext">
will display digits from 7 to 1 as text. The <span class="smallcode">local</span> directive behaves in the
same way as inside macroinstruction with multiple groups of arguments, so:
</p>
<pre class="smallcode">    rept 21
     {
       local label
       label: loop label
     }
</pre>
<p class="smalltext">
will generate unique label for each duplicate.
</p>
<p class="smalltext">
The counter symbol by default counts from 1, but you can declare different
base value by placing the number preceded by colon immediately after the name
of counter. For example:
</p>
<pre class="smallcode">    rept 8 n:0 { pxor xmm#n,xmm#n }
</pre>
<p class="smalltext">
will generate code which will clear the contents of eight SSE registers.
You can define multiple counters separated with commas, and each one can have
different base.
</p>
<p class="smalltext">
The <span class="smallcode">irp</span> directive iterates the single argument through the given list of
parameters. The syntax is <span class="smallcode">irp</span> followed by the argument name, then the comma
and then the list of parameters. The parameters are specified in the same
way like in the invocation of standard macroinstruction, so they have to be
separated with commas and each one can be enclosed with the <span class="smallcode">&lt;</span> and <span class="smallcode">&gt;</span>
characters. Also the name of argument may be followed by <span class="smallcode">*</span> to mark that it
cannot get an empty value. Such block:
</p>
<pre class="smallcode">   irp value, 2,3,5
    { db value }
</pre>
<p class="smalltext">
will generate lines:
</p>
<pre class="smallcode">   db 2
   db 3
   db 5
</pre>
<p class="smalltext">
The <span class="smallcode">irps</span> directive iterates through the given list of symbols, it should
be followed by the argument name, then the comma and then the sequence of any
symbols. Each symbol in this sequence, no matter whether it is the name
symbol, symbol character or quoted string, becomes an argument value for one
iteration. If there are no symbols following the comma, no iteration is done
at all. This example:
</p>
<pre class="smallcode">   irps reg, al bx ecx
    { xor reg,reg }
</pre>
<p class="smalltext">
will generate lines:
</p>
<pre class="smallcode">   xor al,al
   xor bx,bx
   xor ecx,ecx
</pre>
<p class="smalltext">
The blocks defined by the <span class="smallcode">irp</span> and <span class="smallcode">irps</span> directives are also processed in
the same way as any macroinstructions, so operators and directives specific
to macroinstructions may be freely used also in this case.
</p>

<p><b>
<a name="2.3.6" class="smalltext">2.3.6  Conditional preprocessing</a>
</b></p>

<p class="smalltext">
<span class="smallcode">match</span> directive causes some block of source to be preprocessed and passed
to assembler only when the given sequence of symbols matches the specified
pattern. The pattern comes first, ended with comma, then the symbols
that have to be matched with the pattern, and finally the block of
source, enclosed within braces as macroinstruction.
  There are the few rules for building the expression for matching, first is
that any of symbol characters and any quoted string should be matched exactly as is. In this example:
</p>
<pre class="smallcode">    match +,+ { include 'first.inc' }
    match +,- { include 'second.inc' }
</pre>
<p class="smalltext">
the first file will get included, since <span class="smallcode">+</span> after comma matches the <span class="smallcode">+</span> in
pattern, and the second file won't be included, since there is no match.
</p>
<p class="smalltext">
To match any other symbol literally, it has to be preceded by <span class="smallcode">=</span> character
in the pattern. Also to match the <span class="smallcode">=</span> character itself, or the comma, the
<span class="smallcode">==</span> and <span class="smallcode">=,</span> constructions have to be used. For example the <span class="smallcode">=a==</span> pattern
will match the <span class="smallcode">a=</span> sequence.
</p>
<p class="smalltext">
If some name symbol is placed in the pattern, it matches any sequence
consisting of at least one symbol and then this name is replaced with the
matched sequence everywhere inside the following block, analogously to the
parameters of macroinstruction. For instance:
</p>
<pre class="smallcode">    match a-b, 0-7
     { dw a,b-a }
</pre>
<p class="smalltext">
will generate the <span class="smallcode">dw 0,7-0</span> instruction. Each name is always matched with
as few symbols as possible, leaving the rest for the following ones, so in
this case:
</p>
<pre class="smallcode">    match a b, 1+2+3 { db a }
</pre>
<p class="smalltext">
the <span class="smallcode">a</span> name will match the <span class="smallcode">1</span> symbol, leaving the <span class="smallcode">+2+3</span> sequence to be
matched with <span class="smallcode">b</span>. But in this case:
</p>
<pre class="smallcode">    match a b, 1 { db a }
</pre>
<p class="smalltext">
there will be nothing left for <span class="smallcode">b</span> to match, so the block won't get processed
at all.
</p>
<p class="smalltext">
The block of source defined by match is processed in the same way as any
macroinstruction, so any operators specific to macroinstructions can be used
also in this case.
</p>
<p class="smalltext">
What makes <span class="smallcode">match</span> directive more useful is the fact, that it replaces the
symbolic constants with their values in the matched sequence of symbols (that
is everywhere after comma up to the beginning of the source block) before
performing the match. Thanks to this it can be used for example to process
some block of source under the condition that some symbolic constant has the
given value, like:
</p>
<pre class="smallcode">    match =TRUE, DEBUG { include 'debug.inc' }
</pre>
<p class="smalltext">
which will include the file only when the symbolic constant <span class="smallcode">DEBUG</span> was
defined with value <span class="smallcode">TRUE</span>.
</p>

<p><b>
<a name="2.3.7" class="smalltext">2.3.7  Order of processing</a>
</b></p>

<p class="smalltext">
When combining various features of the preprocessor, it's important to know
the order in which they are processed. As it was already noted, the highest
priority has the <span class="smallcode">fix</span> directive and the replacements defined with it. This
is done completely before doing any other preprocessing, therefore this
piece of source:
</p>
<pre class="smallcode">    V fix {
      macro empty
       V
    V fix }
       V
</pre>
<p class="smalltext">
becomes a valid definition of an empty macroinstruction. It can be interpreted
that the <span class="smallcode">fix</span> directive and prioritized symbolic constants are processed in
a separate stage, and all other preprocessing is done after on the resulting
source.
</p>
<p class="smalltext">
The standard preprocessing that comes after, on each line begins with
recognition of the first symbol. It begins with checking for the preprocessor
directives, and when none of them is detected, preprocessor checks whether the
first symbol is macroinstruction. If no macroinstruction is found, it moves
to the second symbol of line, and again begins with checking for directives,
which in this case is only the <span class="smallcode">equ</span> directive, as this is the only one that
occurs as the second symbol in line. If there's no directive, the second
symbol is checked for the case of structure macroinstruction and when none
of those checks gives the positive result, the symbolic constants are replaced
with their values and such line is passed to the assembler.
</p>
<p class="smalltext">
To see it on the example, assume that there is defined the macroinstruction
called <span class="smallcode">foo</span> and the structure macroinstruction called <span class="smallcode">bar</span>. Those lines:
</p>
<pre class="smallcode">    foo equ
    foo bar
</pre>
<p class="smalltext">
would be then both interpreted as invocations of macroinstruction <span class="smallcode">foo</span>, since
the meaning of the first symbol overrides the meaning of second one.
</p>
<p class="smalltext">
When the macroinstruction generates the new lines from its definition block,
in every line it first scans for macroinstruction directives, and interpretes
them accordingly. All the other content in the definition block is used to
brew the new lines,
replacing the parameters with their values and then processing the symbol
escaping and <span class="smallcode">#</span> and <span class="smallcode">`</span>
operators. The conversion operator has the higher
priority than concatenation and if any of them operates on the escaped symbol,
the escaping is cancelled before finishing the operation. After this is
completed, the newly generated line goes through the standard preprocessing,
as described above.
</p>
<p class="smalltext">
Though the symbolic constants are usually only replaced in the lines, where
no preprocessor directives nor macroinstructions has been found, there are some
special cases where those replacements are performed in the parts of lines
containing directives. First one is the definition of symbolic constant, where
the replacements are done everywhere after the <span class="smallcode">equ</span> keyword and the resulting
value is then assigned to the new constant (see <a href="#2.3.2">2.3.2</a>). The second such case
is the <span class="smallcode">match</span> directive, where the replacements are done in the symbols
following comma before matching them with pattern. These features can be used
for example to maintain the lists, like this set of definitions:
</p>
<pre class="smallcode">    list equ

    macro append item
     {
       match any, list \{ list equ list,item \}
       match , list \{ list equ item \}
     }
</pre>
<p class="smalltext">
The <span class="smallcode">list</span> constant is here initialized with empty value, and the <span class="smallcode">append</span>
macroinstruction can be used to add the new items into this list, separating
them with commas. The first match in this macroinstruction occurs only when
the value of list is not empty (see <a href="#2.3.6">2.3.6</a>), in such case the new value for the
list is the previous one with the comma and the new item appended at the end.
The second match happens only when the list is still empty, and in such case
the list is defined to contain just the new item. So starting with the empty
list, the <span class="smallcode">append 1</span> would define <span class="smallcode">list equ 1</span> and the <span class="smallcode">append 2</span> following it
would define <span class="smallcode">list equ 1,2</span>. One might then need to use this list as the
parameters to some macroinstruction. But it cannot be done directly - if <span class="smallcode">foo</span>
is the macroinstruction, then <span class="smallcode">foo list</span> would just pass the <span class="smallcode">list</span> symbol
as a parameter to macro, since symbolic constants are not unrolled at this
stage. For this purpose again <span class="smallcode">match</span> directive comes in handy:
</p>
<pre class="smallcode">    match params, list { foo params }
</pre>
<p class="smalltext">
The value of <span class="smallcode">list</span>, if it's not empty, matches the <span class="smallcode">params</span> keyword, which is
then replaced with matched value when generating the new lines defined by the
block enclosed with braces. So if the <span class="smallcode">list</span> had value <span class="smallcode">1,2</span>, the above line
would generate the line containing <span class="smallcode">foo 1,2</span>, which would then go through the
standard preprocessing.
</p>
<p class="smalltext">
There is one more special case - when preprocessor goes to checking the
second symbol in the line and it happens to be the colon character (what is
then interpreted by assembler as definition of a label), it stops in this
place and finishes the preprocessing of the first symbol (so if it's the
symbolic constant it gets unrolled) and if it still appears to be the label,
it performs the standard preprocessing starting from the place after the
label. This allows to place preprocessor directives and macroinstructions
after the labels, analogously to the instructions and directives processed
by assembler, like:
</p>
<pre class="smallcode">    start: include 'start.inc'
</pre>
<p class="smalltext">
However if the label becomes broken during preprocessing (for example when
it is the symbolic constant with empty value), only replacing of the symbolic
constants is continued for the rest of line.
</p>



<p class="smalltext">
It should be remembered, that the jobs performed by preprocessor are the
preliminary operations on the texts symbols, that are done in a simple
single pass before the main process of assembly. The text that is the
result of preprocessing is passed to assembler, and it then does its
multiple passes on it. Thus the control directives, which are recognized and
processed only by the assembler - as they are dependent on the numerical
values that may even vary between passes - are not recognized in any way by
the preprocessor and have no effect on the preprocessing. Consider this
example source:
</p>
<pre class="smallcode">    if 0
    a = 1
    b equ 2
    end if
    dd b
</pre>
<p class="smalltext">
When it is preprocessed, they only directive that is recognized by the
preprocessor is the <span class="smallcode">equ</span>, which defines symbolic constant <span class="smallcode">b</span>, so later
in the source the <span class="smallcode">b</span> symbol is replaced with the value <span class="smallcode">2</span>. Except for this
replacement, the other lines are passes unchanged to the assembler. So
after preprocessing the above source becomes:
</p>
<pre class="smallcode">    if 0
    a = 1
    end if
    dd 2
</pre>
<p class="smalltext">
Now when assembler processes it, the condition for the <span class="smallcode">if</span> is false, and
the <span class="smallcode">a</span> constant doesn't get defined. However symbolic constant <span class="smallcode">b</span> was
processed normally, even though its definition was put just next to the one
of <span class="smallcode">a</span>. So because of the possible confusion you should be very careful
every time when mixing the features of preprocessor and assembler - always
try to imagine what your source will become after the preprocessing, and
thus what the assembler will see and do its multiple passes on.
</p>

<p><b>
<a name="2.4" class="mediumtext">2.4  Formatter directives</a>
</b></p>

<p class="smalltext">
These directives are actually also a kind of control directives, with the
purpose of controlling the format of generated code.
</p>
<p class="smalltext">
<span class="smallcode">format</span> directive followed by the format identifier allows to select the
output format. This directive should be put at the beginning of the source.
Default output format is a flat binary file, it can also be selected by using
<span class="smallcode">format binary</span> directive.
This directive can be followed by the <span class="smallcode">as</span> keyword
and the quoted string specifying the default file extension for the output
file. Unless the output file name was specified from the command line,
assembler will use this extension when generating the output file.
</p>
<p class="smalltext">
<span class="smallcode">use16</span> and <span class="smallcode">use32</span> directives force the assembler to generate 16-bit or
32-bit code, omitting the default setting for selected output format.
<span class="smallcode">use64</span> enables generating the code for the long mode of x86-64 processors.
</p>
<p class="smalltext">
Below are described different output formats with the directives specific to
these formats.
</p>

<p><b>
<a name="2.4.1" class="smalltext">2.4.1  MZ executable</a>
</b></p>

<p class="smalltext">
To select the MZ output format, use <span class="smallcode">format MZ</span> directive. The default code
setting for this format is 16-bit.
</p>
<p class="smalltext">
<span class="smallcode">segment</span> directive defines a new segment, it should be followed by label,
which value will be the number of defined segment, optionally <span class="smallcode">use16</span> or
<span class="smallcode">use32</span> word can follow to specify whether code in this segment should be
16-bit or 32-bit. The origin of segment is aligned to paragraph (16 bytes).
All the labels defined then will have values relative to the beginning of this
segment.
</p>
<p class="smalltext">
<span class="smallcode">entry</span> directive sets the entry point for MZ executable, it should be
followed by the far address (name of segment, colon and the offset inside
segment) of desired entry point.
</p>
<p class="smalltext">
<span class="smallcode">stack</span> directive sets up the stack for MZ executable. It can be followed by
numerical expression specifying the size of stack to be created automatically
or by the far address of initial stack frame when you want to set up the stack
manually. When no stack is defined, the stack of default size 4096 bytes will
be created.
</p>
<p class="smalltext">
<span class="smallcode">heap</span> directive should be followed by a 16-bit value defining maximum size
of additional heap in paragraphs (this is heap in addition to stack and
undefined data). Use <span class="smallcode">heap 0</span> to always allocate only memory program really
needs. Default size of heap is 65535.
</p>

<p><b>
<a name="2.4.2" class="smalltext">2.4.2  Portable Executable</a>
</b></p>

<p class="smalltext">
To select the Portable Executable output format, use <span class="smallcode">format PE</span> directive, it
can be followed by additional format settings: first the target subsystem
setting, which can be <span class="smallcode">console</span> or <span class="smallcode">GUI</span> for Windows applications, <span class="smallcode">native</span>
for Windows drivers, <span class="smallcode">EFI</span>, <span class="smallcode">EFIboot</span> or <span class="smallcode">EFIruntime</span> for the UEFI. <span class="smallcode">DLL</span>
keyword following the subsystem setting marks the output file as a dynamic link
library. Then can follow the <span class="smallcode">at</span> operator and the numerical expression
specifying the base of PE image and then optionally <span class="smallcode">on</span> operator followed by
the quoted string containing file name selects custom MZ stub for PE program
(when specified file is not a MZ executable, it is treated as a flat binary
executable file and converted into MZ format). The default code setting for
this format is 32-bit. The example of fully featured PE format declaration:
</p>
<pre class="smallcode">    format PE GUI 4.0 DLL at 7000000h on 'stub.exe'
</pre>
<p class="smalltext">
To create PE file for the x86-64 architecture, use <span class="smallcode">PE64</span> keyword instead of
<span class="smallcode">PE</span> in the format declaration, in such case the long mode code is generated
by default.
</p>
<p class="smalltext">
<span class="smallcode">section</span> directive defines a new section, it should be followed by quoted
string defining the name of section, then one or more section flags can
follow. Available flags are: <span class="smallcode">code</span>, <span class="smallcode">data</span>, <span class="smallcode">readable</span>, <span class="smallcode">writeable</span>,
<span class="smallcode">executable</span>, <span class="smallcode">shareable</span>, <span class="smallcode">discardable</span>, <span class="smallcode">notpageable</span>.
The origin of section is aligned to page (4096 bytes). Example declaration of PE section:
</p>
<pre class="smallcode">    section '.text' code readable executable
</pre>
<p class="smalltext">
Among with flags also one of the special PE data identifiers can be specified to mark the whole
section as a special data, possible identifiers are <span class="smallcode">export</span>, <span class="smallcode">import</span>,
<span class="smallcode">resource</span> and <span class="smallcode">fixups</span>. If the section is marked to contain fixups, they are
generated automatically and no more data needs to be defined in this section.
Also resource data can be generated automatically from the resource file, it
can be achieved by writing the <span class="smallcode">from</span> operator and quoted file name after the
<span class="smallcode">resource</span> identifier. Below are the examples of sections containing some special PE data:
</p>
<pre class="smallcode">    section '.reloc' data discardable fixups
    section '.rsrc' data readable resource from 'my.res'
</pre>
<p class="smalltext">
<span class="smallcode">entry</span> directive sets the entry point for Portable Executable, the value of
entry point should follow.
</p>
<p class="smalltext">
<span class="smallcode">stack</span> directive sets up the size of stack for Portable Executable, value
of stack reserve size should follow, optionally value of stack commit
separated with comma can follow. When stack is not defined, it's set by
default to size of 4096 bytes.
</p>
<p class="smalltext">
<span class="smallcode">heap</span> directive chooses the size of heap for Portable Executable, value of
heap reserve size should follow, optionally value of heap commit separated
with comma can follow. When no heap is defined, it is set by default to size
of 65536 bytes, when size of heap commit is unspecified, it is by default set
to zero.
</p>
<p class="smalltext">
<span class="smallcode">data</span> directive begins the definition of special PE data, it should be
followed by one of the data identifiers (<span class="smallcode">export</span>, <span class="smallcode">import</span>, <span class="smallcode">resource</span> or
<span class="smallcode">fixups</span>) or by the number of data entry in PE header. The data should be
defined in next lines, ended with <span class="smallcode">end data</span> directive. When fixups data
definition is chosen, they are generated automatically and no more data needs
to be defined there. The same applies to the resource data when the <span class="smallcode">resource</span>
identifier is followed by <span class="smallcode">from</span> operator and quoted file name - in such case
data is  taken from the given resource file.
</p>
<p class="smalltext">
The <span class="smallcode">rva</span> operator can be used inside the numerical expressions to obtain
the RVA of the item addressed by the value it is applied to.
</p>


<p><b>
<a name="2.4.3" class="smalltext">2.4.3  Common Object File Format</a>
</b></p>

<p class="smalltext">
To select Common Object File Format, use <span class="smallcode">format COFF</span> or <span class="smallcode">format MS COFF</span>
directive whether you want to create classic or Microsoft's COFF file. The
default code setting for this format is 32-bit. To create the file in
Microsoft's COFF format for the x86-64 architecture, use <span class="smallcode">format MS64 COFF</span>
setting, in such case long mode code is generated by default.
</p>
<p class="smalltext">
<span class="smallcode">section</span> directive defines a new section, it should be followed by quoted
string defining the name of section, then one or more section flags can
follow.
Section flags available for both COFF variants are <span class="smallcode">code</span> and <span class="smallcode">data</span>,
while flags <span class="smallcode">readable</span>, <span class="smallcode">writeable</span>, <span class="smallcode">executable</span>, <span class="smallcode">shareable</span>, <span class="smallcode">discardable</span>,
<span class="smallcode">notpageable</span>, <span class="smallcode">linkremove</span> and <span class="smallcode">linkinfo</span> are available only with
Microsoft's COFF variant.
</p>
<p class="smalltext">
By default section is aligned to double word (four bytes), in case of Microsoft COFF variant other alignment
can be specified by providing the <span class="smallcode">align</span> operator followed by alignment value
(any power of two up to 8192) among the section flags.
</p>
<p class="smalltext">
<span class="smallcode">extrn</span> directive defines the external symbol, it should be followed by the
name of symbol and optionally the size operator specifying the size of data
labeled by this symbol. The name of symbol can be also preceded by quoted
string containing name of the external symbol and the <span class="smallcode">as</span> operator.
Some example declarations of external symbols:
</p>
<pre class="smallcode">    extrn exit
    extrn '__imp__MessageBoxA@16' as MessageBox:dword
</pre>
<p class="smalltext">
<span class="smallcode">public</span> directive declares the existing symbol as public, it should be
followed by the name of symbol, optionally it can be followed by the <span class="smallcode">as</span>
operator and the quoted string containing name under which symbol should be
available as public.
Some examples of public symbols declarations:
</p>
<pre class="smallcode">    public main
    public start as '_start'
</pre>
<p class="smalltext">
Additionally, with COFF format it's possible to specify exported symbol as
static, it's done by preceding the name of symbol with the <span class="smallcode">static</span> keyword.
</p>
<p class="smalltext">
When using the Microsoft's COFF format, the <span class="smallcode">rva</span> operator can be used
inside the numerical expressions to obtain the RVA of the item addressed by the
value it is applied to.
</p>

<p><b>
<a name="2.4.4" class="smalltext">2.4.4  Executable and Linkable Format</a>
</b></p>

<p class="smalltext">
To select ELF output format, use <span class="smallcode">format ELF</span> directive. The default code
setting for this format is 32-bit. To create ELF file for the x86-64
architecture, use <span class="smallcode">format ELF64</span> directive, in such case the long mode code is
generated by default.
</p>
<p class="smalltext">
<span class="smallcode">section</span> directive defines a new section, it should be followed by quoted
string defining the name of section, then can follow one or both of the
<span class="smallcode">executable</span> and <span class="smallcode">writeable</span> flags, optionally also <span class="smallcode">align</span> operator followed
by the number specifying the alignment of section (it has to be the power of
two), if no alignment is specified, the default value is used, which is 4 or 8,
depending on which format variant has been chosen.
</p>
<p class="smalltext">
<span class="smallcode">extrn</span> and <span class="smallcode">public</span> directives have the same meaning and syntax as when the
COFF output format is selected (described in previous section).
</p>
<p class="smalltext">
The <span class="smallcode">rva</span> operator can be used also in the case of this format (however not
when target architecture is x86-64), it converts the address into the offset
relative to the GOT table, so it may be useful to create position-independent
code. There's also a special <span class="smallcode">plt</span> operator, which allows to call the external
functions through the Procedure Linkage Table. You can even create an alias
for external function that will make it always be called through PLT, with
the code like:
</p>
<pre class="smallcode">    extrn 'printf' as _printf
    printf = PLT _printf
</pre>
<p class="smalltext">
To create executable file, follow the format choice directive with the <span class="smallcode">executable</span> keyword.
It allows to use <span class="smallcode">entry</span> directive followed by the value to set as entry point of
program. On the other hand it makes <span class="smallcode">extrn</span> and
<span class="smallcode">public</span> directives unavailable, and instead of <span class="smallcode">section</span> there should be the
<span class="smallcode">segment</span> directive used, followed only by one or more segment permission
flags. The origin of segment is aligned to page (4096 bytes), and available
flags for are: <span class="smallcode">readable</span>, <span class="smallcode">writeable</span> and <span class="smallcode">executable</span>.
</p>


  </div>

  <p class="navigation">
    <a class="boldlink" href="http://flatassembler.net/index.php">Main index</a>
    <a class="boldlink" href="http://flatassembler.net/download.php">Download</a>
    <a class="boldlink" href="http://flatassembler.net/docs.php">Documentation</a>
    <a class="boldlink" href="http://flatassembler.net/examples.php">Examples</a>
    <a class="boldlink" href="http://board.flatassembler.net/index.php">Message board</a>
  </p>

  <p>
     Copyright © 2004-2009, <a href="mailto:tgrysztar@flatassembler.net">Tomasz Grysztar</a>.
  </p>

</body>
</html>